Insurance claim forecasting system

ABSTRACT

A computer-implemented process of developing a person-level cost model for forecasting future costs attributable to claims from members of a book of business, where person-level data are available for a substantial portion of the members of the book of business for an actual underwriting period, and the forecast of interest is for a policy period is disclosed. The process uses development universe data comprising person-level enrollment data, historical base period health care claims data and historical next period claim amount data for a statistically meaningful number of individuals. The process also provides at least one claim-based risk factor for each historical base period claim based on the claim code associated with the health care claim and provides at least one enrollment-based risk factor based on the enrollment data. The process also develops a cost forecasting model by capturing the predictive ability of the main effects and interactions of claim based risk factors and enrollment-based risk factors, with the development universe data through the application of an interaction capturing technique to the development universe data.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on provisional applications 60/249,060, filedNov. 15, 2000, and 60/267,131 filed Feb. 7, 2001, which are incorporatedby reference herein.

REFERENCE TO PROGRAM LISTINGS

A computer program listing appendix has been submitted on compact discfor this disclosure. The material on that compact disc is incorporatedby reference herein. The compact disc was filed with 2 copies, andcontains the following files with:

NAME OF FILE DATE OF CREATION SIZE IN BYTES APPENDIX.TXT May 14, 2001281,991The names above are the names of the files on the compact disc, thedates are the dates the files were created on the compact disk, and thesize in bytes is the size of the file. Please note that there is aglossary of terms included at the end of the Background section.

BACKGROUND OF THE INVENTION

This invention pertains to health, disability and life insurancesystems, particularly including processing data (in the business ofhealth insurance) for estimating future costs or liability and settingoptimal pricing. For convenience, we call one embodiment of ourinvention More Accurate Predictions for Health Insurance Premiums orMAP4HIP.

Group health insurance is typically priced through a series of steps.Historical claims costs are calculated by summing the costs of insuredindividuals. Actuaries estimate what the general cost inflation trendwill be next period. If an insured group is large enough to havecredible experience (historical costs), the inflation trend may beapplied to the historical claims experience to produce an estimate ofthe expected claims for next period. A profit margin and administrativecosts are added to the expected group claims costs to produce theso-called “experience rate”. An underwriter reviews the group'sexperience and adjusts the cost and profit margin-based price dependingon special circumstances and competitive pressure. The standard practiceis to use group-level data for estimating costs and setting pricesexcept for very small groups, individual policies or specific medicalstop loss insurance. Information on the insured's (i.e., individual's)medical conditions is typically not used when group-level data are usedfor underwriting and pricing the group's aggregate cost forecast.

The current standard practice for estimating future health care costsfor groups of 50 or more employees plus their dependents uses one of twomethods or is a combination of those methods. If the group is largeenough to have credible, stable experience, the historical costs areassumed to be the best estimate of next period's costs after a costtrend factor for inflation has been included. If the group is too smallto have credible historical costs, many groups are combined together andaveraged so that a stable demographic look-up table of historicalaverage costs by age group by gender by family size can be developed andused as a weighting mechanism for estimating the expected future costsfor non-credible groups. Cost trend factors for inflation are thenapplied. If a group does not have completely credible or non-credibleexperience, a blended average of its experience and a demographiclook-up table forecast is used. These standard actuarial methods do notaccount for person-level trends in historical costs nor medicalinformation about the person.

Small groups (i.e., 50 or fewer employees plus their dependents) orindividual medical policies may use medical questionnaires from initialenrollment applications as input to an underwriter for estimating nextperiod's group-level costs. Manual underwriting is expensive due to thelabor intensity and is prone to variability among underwriters as theirexperience varies.

Some state Medicaid HMO programs (e.g., Colorado and Maryland) andfederal Medicare HMO programs are using statistical algorithms that makeperson-level cost forecasts based on diagnoses from the computerizedmedical bills and demographic factors. These “risk adjustment” methodsdo not use procedures or historical person-level costs as thegovernments do not want incentives for increased utilization of servicesand spending more money. The governments' intent for HMO payments ormanaged care is to make payments proportional to the insured populationsneed for care based on their health conditions but not on prior care.However, historical cost is the single best predictor of future medicalcost for credible groups. Not using it as part of the forecasting methoddecreases the accuracy of the forecast.

Some medical insurance companies may be using such “risk adjustment”algorithms used by Medicare, Medicaid and others intended for managedcare cost forecasting or payment allocation. However, the prospectiveuse of historical costs, types of services and procedures as well asdiagnoses and demographics, as well as combinations of these variables,to produce more accurate cost forecasts than “risk adjustment”algorithms using only diagnoses and demographic factors, would bedesirable.

There are person-level diagnosis and procedure models that measure theefficiency of medical practices (i.e., costs of care given the patient'sconditions). These models are typically concurrent or retrospective innature and not prospective. Symmetry's ETGs are a good example of thisclass of models. It lacks cost experience as a predictor since that isintended as the dependent variable. It also may limit use of demographicvariables. Forecasting models would be desirable which are prospectiveand not designed for concurrent or retrospective analysis. The methodsof the present invention can be applied to concurrent data to developmodels for efficiency analysis, as will be described.

Stop loss health (or medical) insurance is typically purchased byself-insured employers that wish to limit their medical expenseexposure. The most common form of medical stop loss insurance is knownas “specific stop loss” insurance which is a high deductible (usually$25,000 to $100,000) insurance policy per insured person. Specific stoploss medical insurance is designed to protect the employer or otherpayer from large catastrophic medical expenses such as those incurredfor liver transplants or care for neonates with major repairablecongenital anomalies. The standard method for underwriting specific stoploss medical insurance uses a demographic look-up table to estimatecosts for individuals whose medical expenses were under 50% of thedeductible in the previous year. If an insured's medical expenses wereover a predetermined amount, such as over 50% of the specificdeductible, the insured's medical records are reviewed manually by anunderwriter, and next year's costs are estimated by the underwriter or adoctor or nurse using their experience and expert opinion. Manualmedical underwriting for specific stop loss has the same problems asmanual underwriting for small group medical insurance; it is expensiveand prone to underwriter variability.

Frequently, “aggregate stop loss medical insurance” coverage is alsopurchased by the employer. Aggregate coverage (exclusive of specificpayments) means that the insurer will pay the employer's or otherpayer's medical cost obligations for a covered group if those costsexceed an agreed upon amount (i.e., an “attachment point”). Theattachment point is typically defined as 125% of the group's expectedcost in the insured period. The industry standard for calculating theexpected cost is substantially the same method as used for fully insuredplans. In other words, if the group is large enough to have completelycredible experience, the last year's experience is modified by forecastinflation and increased by 25% to produce the 125% attachment point. Ifthe group's experience is partially credible, then a weightedcombination of experience and demographic look-up table model is usedwith an inflation forecast and increased 25% to calculate the 125%attachment point. When the group is too small to have credibleexperience, the demographic look-up table model is used as the startingpoint then trended inflation increased by 25% is used to calculate the125% attachment point. Aggregate only medical stop loss insurance hasbeen recently offered by one company (Cairnstone) to credible groups,and we believe that it uses group-level experience plus trendedinflation to estimate future costs. Price is usually determined bycompetitive pressure but the inventors are not familiar with proprietarytechniques used by the insurers.

We are including a glossary of terms that are used in describing theinvention so that we are precise in our description. Additionally, SAScomputer code and CART modeling language will be included to provideconcrete examples of the implementation of the process or products. Thesoftware Appendix found on the compact disc filed with the presentdisclosure contains computer code (minus copyrighted formats) of asimpler embodiment of the invention. That code is in SAS and S Plus andthe regression tree used is RPART. Details are provided for the fullyinsured renewal product. The aggregate only stop loss product uses thesame steps for cost estimation. The short term disability, long termdisability and life insurance products use the same techniques forforecasting but the dependent variables are changed to reflect theinsurance type.

GLOSSARY OF TERMS

1. Aggregate only stop loss health insurance—A health insurance productfor self funded employers that want to cap their maximum liability. Theaggregate only policy will pay off costs above an agreed upon limit(i.e., the attachment point). Usually, the attachment point is 125% ofexpected costs but it could be 110% or some other amount. The expectedcosts are estimated using an embodiment of this invention or usingstandard actuarial methods. Aggregate only stop loss does not includespecific stop loss. However, specifics can be combined with aggregatestop loss. In that case the specific payments are not included in thecosts counted against the aggregate attachment point.

2 Base Period—A period of typically 12 consecutive months prior to thelag period during which services were provided to some enrollees andreflected by claims entered in a computer file. In practice, it may bemore or less than 12 months. Risk factors are coded on data from thebase period. These data are used to forecast the next period costs. Inother words, these data are used to calculate the predictors for thedevelopment model and are not used for underwriting actual healthinsurance policies.

3 Book of Business—The insurance of a given type (e.g., small group,individual, large group) for all persons covered by an insurer at apoint in time or during a specified period. An insurer may have multiplebooks of business.

4 Bias Test—A comparison of observed to predicted values from a model.The totals of both these values are equal to the total population whichserved as the standard in the preparation of the model. Bias testsdetermine whether or not there is any meaningful systematic disparitybetween observed and predicted cost when persons are sorted by predictedvalues, age or family composition or other characteristics. Disparitiesare considered as bias which better models eliminate or reduce. Anotherrelated measures sorts by the actual rather than the predicted valuesand is a measure of the accuracy of the forecasts.

5 Candidate Predictor Variable—An array of variables derived from the CI(client insurer) database and available to the statistical softwarewhich selects those which are most predictive of the dependent variable(e.g., by stepwise OLS, CART regression trees).

6 Claim amount: This is the total cost or payments made by the insurer.

7 Claim codes: These include ICD-9-CM diagnosis and procedures, CPTcodes, National Drug Codes and other standardized coding systems valuessuch as SNOWMED codes.

8 Claim-based risk factors: These are risk factors derived from theclaim code, claim amount and transformations of the claim amount, typeand place of services, provider type, units of service and otherinformation contained on a health care claim. These risk factors arepresent in either the base or underwriting period.

9 Clinical risk factors: Risk factors derived from the claim codes, typeand place of service and provider type but not solely from the claimamount.

10 Client Insurer (CI)—The insurance entity for which the invention isto be applied.

11 Concurrent Cost Models—Used synonymously with Retrospective CostModels and defined elsewhere.

12 Costs of health care—May be defined as either of the following.Measured in dollars (usually per person per day in this application)

a. Claims—total bills for care submitted to the insurer forreimbursement

b. Payments—The amounts actually paid by the insurer. Payments arealways less than the claims due to deductibles, benefits and non-coveredservices.

13 Cost Inflation—Used synonymously with cost trend. The secular trendin costs per person for health care due to changes in practice patternsand price per service. Does not usually consider changes in apopulation's health care needs which are usually minimal in the shortrun. Differs from pure price inflation such as that measured in theconsumer price index (CPI).

14 Credibility—The degree to which this experience may confidently beused as the basis for future rates relates to its credibility.

15. Demographic look-up table—This is a method used by actuaries toestimate group-level costs when the group is too small to have credibleexperience. Average costs are calculated across a large pool of groupsand averages are calculated by cell in a table of age by sex by familycomposition or other similar demographics. The appropriate cell amountsare applied to each person or employee in a non credible group andsummed to calculate its expected cost.

16 Dependent Measure—The dependent measure is the forecast of the modelthrough application of the interaction capturing technique. Atransformation may be applied to the dependent measure to calculate theclaim amount (e.g., multiplying a probability by an average cost). Forhealth insurance and medical stop loss insurance the dependent measureis the future cost of health care for the population which comprises theCI book of business at the time the rates are to be quoted. Forshort-term disability the dependent measure is disability days. For longterm disability and life insurance the dependent variable is theprobability of the event.

17 Enrollment-based risk factors—These are risk factors that are derivedfrom the enrollment information only such as age, sex, relationship tothe enrollee, length of enrollment, geographic locale and type ofcoverage and does not include claim information or claim amount. Theemployees salary, disability coverage terms and term life insurancecoverage terms may be included in the enrollment file also.

18 Experience model—This is a method used by actuaries for estimatingcost next year at the group-level. If the group is deemed credible, thelast year's cost (or experience) is considered to be the best estimateof next year's cost. A cost trend is added to account for medicalinflation for next year's cost.

19 Group—A group is a collection of one or more people that are coveredby one insurance policy. A traditional group is a collection ofemployees and their dependents that work for an employer at a location.A group can be an individual or a family by purchasing an “individual”health insurance policy where the remaining immediate family may also becovered by the policy.

20 Health Insurance—Insurance for the array of benefits covered by thehealth insurance policies of the client insurance company or aself-insured company including hospital, surgical and medical care plusdrug benefits for some plans. Medical insurance is used as a synonym.

21 Hybrid Tree Analysis—The use of regression trees (or other analyticmethod output) as input to other regression models such as OLS, medianand logistic regression or neural networks. Additionally, a model'soutput (e.g., regression or neural network) may be used as input intothe regression or probability tre.

22 Interaction Capturing Technique—A mathematical and logicaltransformation of independent variables that predicts a response ordependent variable. The interaction capturing technique includes maineffects, interaction effects and possibly time series effects.Statistical techniques that are examples of interaction capturingtechniques include, but are not limited to, ANOVA, regression methods(e.g., linear, logistic, shrinkage, robust, ridge), regression trees,moving averages and autoregressive moving averages, look-up tables,means, probability models, clustering algorithms and many other methods.Data mining techniques that are examples of interaction capturingtechniques include, but are not limited to, decision trees, ruleinduction, genetic algorithms, neural networks, nearest neighbor andother data mining methods.

23 Lag Period—A period between the base period and the next period orthe underwriting and policy period which is required because of delaysin filing claims, preparing or revising model weights, calculatingpremium rates and submitting them to insured groups in a timely way.

24 MAP 4 HIP—This is an acronym of More Accurate Predictions for HealthInsurance Premiums which in turn is a brief title for our invention forits application to health insurance.

25 Next Period—Typically a 12 consecutive month period subsequent to thebase period and the lag period that contains the data that comprise thedependent variable used in the development model. Actual insurancepolicies are not written for this period but are underwritten for thepolicy period.

26 Policy Period—Typically a 12 consecutive month period subsequent tothe underwriting period and the lag period that contains the data thatcomprise the actual cost borne by the insurer. These costs are forecastusing the application of the development model to the data from theunderwriting period with appropriate adjustments made for assumptionsabout inflation.

27 Prospective Cost Models—The candidate predictor variables relate to atime period which precedes the dependent variable.

28 Retrospective Cost Models—The candidate predictor variables relate tothe same time period as the dependent variable.

29. Specific stop loss health insurance—A health insurance coverage forself-funded employers or other payor that has a very high deductible perperson. Usually the deductible is at least $10,000 and may be as high as$500,000 per person. Typically the deductible is between $25,000- and$100,000 per person and is meant to pay for catastrophic care.

30 Standard population—The cases in the data set which are used toselect predictor variables and to weight them by their relation to thedependent variable. For this invention, the cases are an insuredpopulation.

31. Subscriber unit—The family unit that health insurance premium ischarged by. For example, the simplest are two units: 1) a single personand 2) two or more people. Single person, married couple and three ormore people is a common classification but more detailed versions arealso used. The subscriber is the employee.

32 Third Party Administration or TPA—A company that processes the healthinsurance claims for a self funded employer. The TPA may be part of aninsurance company or not.

33 Underwriting Period—A period of typically 12 consecutive months priorto the lag period during which services were provided to some enrolleesand reflected by claims entered in a computer file. In practice, it maybe more or less than 12 months. Risk factors are coded on data from theunderwriting period. These data are used to forecast the policy periodcosts. In other words, these data are used to calculate the predictorsfor the model that is used for underwriting actual health insurancepolicies.

34 Winsorize—Data are Winsorized if the most extreme observations on oneor both ends of the ordered samples are replaced by the nearest retainedobservation. Our cost distributions have no low cost outliers and henceWinsorization is applied only to the high end of the ordered sample.

BRIEF SUMMARY OF THE INVENTION

One aspect of the invention contemplates a computer-implemented processof developing a person-level cost model for forecasting future costsattributable to claims from members of a book of business, whereperson-level data regarding actual base period health care claims areavailable for a substantial portion of the members of the book ofbusiness for an actual underwriting period, and the forecast of interest(i.e., future claim amount) is for an actual policy period which can be,but is not necessarily contiguous with the actual underwriting period,having the steps of:

providing development universe data comprising person-level enrollmentdata, historical base period health care claims data and historical nextperiod claim amount data for a statistically meaningful number ofindividuals, where the person-level data on a health care claimcomprises at least a claim code and a claim amount;

providing at least one claim-based risk factor for each historical baseperiod claim based on the claim code associated with the health careclaim and providing at least one enrollment-based risk factor based onthe enrollment data; and

developing a cost forecasting model by capturing the predictive abilityof the main effects and interactions of claim based risk factors andenrollment-based risk factors, with the development universe datathrough the application of an interaction capturing technique to thedevelopment universe data.

A further aspect of the invention contemplates a computer-implementedprocess wherein the interaction capturing technique is selected from thegroup consisting of median regression tree techniques, least squareregression tree techniques, rule induction techniques, ordinary leastsquares regression techniques, median regression techniques, robustregression techniques, genetic algorithms, rule induction, clusteringtechniques and neural network techniques.

Yet another aspect of the invention is a computer implemented processwherein the person-level next period cost forecasts are adjusted bymodifying the extant cost forecast by the expected cost trend.

A yet further aspect of the invention is a computer implemented processof wherein the datum from the claims used as predictors consistessentially of the claim- and enrollment-based risk factors and theclaim amount is a standardized cost of services provided and the modelis used to allocate prospective payments to health care providers.

A still yet further aspect of the invention is a computer implementedprocess wherein the data used from the claims data consist essentiallyof the claim code and selected mandatory procedures and the claim amountis a standardized cost of services provided during the same time periodas the base period and the model is used to evaluate the efficiency ofhealth care providers.

Another aspect of the invention is a computer implemented process offorecasting future claim amounts attributable to claims from members ofa book of business for an actual policy period, wherein the modeldevelopment universe comprises data from the members of a book ofbusiness to be insured, further comprising:

applying the cost-forecasting model to the actual underwriting periodperson-level data of each of the members of the book of business togenerate a person-level actual policy period cost forecast for eachmember of the book of business; and

producing a group-level forecast for the actual underwriting period fromthe person-level forecasts of each member of the group by totaling theperson-level actual policy period cost forecasts for the group for thepolicy period.

Yet another aspect of the invention is a computer implemented processcomprising the step of: setting insurance reserves based on group-levelforecast for the actual policy period, wherein the policy period is areserving period for claims that have not occurred or that have occurredbut not been reported.

Yet still another further aspect of the invention is a computerimplemented process, wherein claim amounts are a mix of fee for servicepayments and capitation payments so that the base and underwritingperiods risk factors are appended to include dummy variables for thepresence of capitation payments by provider type and the cost estimatein the next and policy periods is the fee for service cost that must besupplemented with the expected capitation payments.

Still another aspect of the invention is a computer-implemented processof developing a hybrid person-level health care claim cost forecastingmodel for forecasting future medical costs attributable to health careclaims from members of a book of business, where person-level data areavailable for a substantial portion of the members of the book ofbusiness, comprising the steps of:

providing development universe data comprising person-level data for astatistically meaningful number of individuals, the person-level datacomprising continuous variable data and categorical variable data;

processing first the continuous variable data for each individual with acontinuous processing technique that captures the predictive ability ofmain effects and interactions of continuous variables to generate aperson-level continuous variable model; and

processing the categorical variable data for each individual includingthe output from the continuous processing technique with a categoricalprocessing technique that captures the predictive ability of maineffects and interactions of categorical variables to generate aperson-level categorical variable model;

wherein the person-level continuous variable model and person-levelcategorical variable model together comprise a hybrid person-levelhealth care claim amount forecasting model.

Yet another aspect of the invention is a computer-implemented process ofdeveloping a claim amount forecasting model for use in forecasting thefuture claim amount for members of a book of business, whereperson-level data are available for a substantial portion of the membersof the book of business for an actual base period, and the claim amountof interest for forecasting purposes is an actual next period which canbe, but is not necessarily contiguous with the actual base period,comprising the steps of:

processing the base period data having claims to generate ahaving-claims claim amount forecasting model; and

processing the base period data without claims to generate awithout-claims claim amount forecasting model,

wherein the having-claims cost forecasting model and the without-claimsforecasting model comprise a claim amount forecasting model.

Yet another aspect of the invention is a computer-implemented process ofdeveloping a health care claim amount forecasting model for use inforecasting the future medical claim amount for members of a book ofbusiness, where person-level data are available for a substantialportion of the members of the book of business for an actual baseperiod, and the claim amount of interest for forecasting purposes is anactual next period which can be, but is not necessarily contiguous withthe actual base period, comprising the steps of:

providing development universe data comprising person-level data for astatistically meaningful plurality of individuals, wherein theperson-level data for an individual comprises health care claims datafor the individual and the data on a health care claim comprises atleast a claim amount and a claim code;

Winsorizing the person-level data to yield inlier data and outlier data;

processing the inlier data to generate an inlier cost forecasting model;and

processing the outlier data to generate an outlier cost forecastingmodel;

wherein the combination of the results of the inlier and outlier costforecasting models together produce a person-level claim amount forecastmodel.

Another aspect of the invention is a computer-implemented process ofcomprising:

Winsorizing the inlier data to yield inlier data having claims andinlier data without claims;

processing the inlier data having claims to generate aninlier-having-claims claim amount forecasting model; and

processing the inlier data without claims to generate aninlier-without-claims claim amount forecasting model,

wherein the inlier-having-claims cost forecasting model and theinlier-without-claims forecasting model comprise an inlier claim amountforecasting model.

A still further aspect of the invention is a computer-implementedprocess of forecasting a claim amount attributable to claims frommembers of a book of business during an actual policy period, comprisingthe steps of:

providing person-level data, comprising enrollment data for members of abook of business to be insured for an actual underwriting period thatcan be, but is not necessarily, contiguous with the actual policyperiod;

providing a model development universe of person-level data, comprisingenrollment data from the historical base period and historical nextperiod heath care claims data for a statistically meaningful number ofindividuals;

providing enrollment-based risk factors for each historical base periodand providing next period claim amounts;

developing a health care cost-forecasting model for the enrollment databy capturing the predictive ability of main effects and interactions ofenrollment-based risk factors through the application of an interactioncapturing techniques to the model development universe;

applying the health care cost-forecasting model to the person-levelunderwriting period enrollment data of each of the members of the bookof business to generate a person-level expected cost forecast for thepolicy period for each member of the book of business; and

producing a group-level forecast for the expected cost of the policyperiod from the person-level forecasts of each person of the group bytotaling the person-level expected cost forecasts for the actual policyperiod.

A still further aspect of the invention is a computer-implementedprocess of forecasting costs attributable to claims from members of abook of business during an actual policy period, comprising the stepsof:

providing person-level data, comprising enrollment data and actualunderwriting period health care claims data, for members of a book ofbusiness, where the person-level data on a health care claim comprisesat least a claim amount and a claim code and the actual underwritingperiod can be, but is not necessarily, contiguous with the actual policyperiod;

providing a model development universe of person-level data, comprisingenrollment data, historical base period health care claims data andhistorical next period claim amount data for a statistically meaningfulnumber of individuals, where the person-level data on a base periodhealth care claim includes at least a claim amount and a claim code;

providing claim-based risk factors for each historical base period basedon the claim code associated with the health care claim and providing atleast one enrollment risk factor based on the enrollment data;

developing a cost-forecasting model by capturing the predictive abilityof main effects and interactions of risk factors through the applicationof an interaction capturing technique to the model development universe;

applying the cost-forecasting model to the person-level data of each ofthe individuals or members of a group to generate a person-level actualpolicy period expected cost forecast for each member of the group; and

producing a group-level forecast for the actual policy period from theperson-level forecasts of each individual or member of the group bytotaling the person-level cost forecasts for the actual policy period.

Yet a further aspect of the invention is an automated system forforecasting future costs attributable to claims from members of a bookof business during an actual policy period comprising:

a central processing unit;

an insured person database, accessible by the processor, wherein thedatabase comprises person-level enrollment data and actual underwritingperiod health care claims data, for members of a book of business to beinsured, where the person-level data on a health care claim comprises atleast a claim amount and a claim code;

a model development universe database, accessible by the processor,wherein the second database comprises model development universe ofperson-level data, comprising enrollment data, historical base periodhealth care claims data and historical next period claim amount data fora statistically meaningful number of individuals, where the person-leveldata on the base period health care claim includes at least a claimamount and a claim code;

a risk factor encoder, accessible by the processor, wherein the riskfactor encoder encodes claim-based risk factors for each historical baseperiod based on the claim code associated with the health care claim andthe risk factor encoder encodes at least one enrollment risk factorbased on the enrollment data;

a model generator, accessible by the processor, that generates acost-forecasting model by capturing the predictive capacity of the maineffects and the interaction of the risk factors assigned by the riskfactor encoder to forecast the historical next period of the modeldevelopment universe data using the historical base period data;

a person-level cost generator that applies the cost-forecasting model tothe person-level actual underwriting period health care claims data ofeach of the members of the book of business to generate a person-levelactual policy period claim amount forecast for each member of the bookof business; and

an actual policy period group-level cost forecast generator that totalsthe person-level actual next period forecasts for each member of thegroup to generate an actual policy period group-level cost forecast.

Still another aspect of the invention is a computer-implemented processof forecasting costs attributable to claims from members of a book ofbusiness during an actual policy period, comprising the steps of:

means for providing person-level data, comprising enrollment data andactual underwriting period health care claims data, for members of abook of business, where the person-level data on a health care claimcomprises at least a claim amount and a claim code and the actualunderwriting period can be, but is not necessarily, contiguous with theactual policy period;

means for providing a model development universe of person-level data,comprising enrollment data, historical base period health care claimsdata and historical next period claim amount data for a statisticallymeaningful number of individuals, where the person-level data on a baseperiod health care claim includes at least a claim amount and a claimcode;

means for providing claim-based risk factors for each historical baseperiod based on the claim code associated with the health care claim andproviding at least one enrollment risk factor based on the enrollmentdata;

means for developing a cost-forecasting model by capturing thepredictive ability of main effects and interactions of risk factorsthrough the application of an interaction capturing technique to themodel development universe;

means for applying the cost-forecasting model to the person-level dataof each of the individuals or members of a group to generate aperson-level actual policy period expected cost forecast for each memberof the group; and

means for producing a group-level forecast for the actual policy periodfrom the person-level forecasts of each individual or member of thegroup by totaling the person-level cost forecasts for the actual policyperiod.

A still further aspect of the invention is a group insurance productcomprising:

an identification of the types of benefits which are agreed to beprovided by an insurer to or on behalf of members of a group, which willbe incurred by members of said group during a future time period; and

a stated monetary insurance premium including a forecast of saidbenefits, estimated costs of administering the insurance product, andoptionally, an estimated profit,

whereby an insurer agrees to cover the identified benefits in exchangefor the payment of the stated monetary insurance premium.

Yet another aspect of the invention is a method of pricing groupinsurance including a cost of future benefits according to thecomputer-implemented process of forecasting future medical costsattributable to claims from members of a group during an actualunderwriting period, comprising the steps of:

providing an expected amount of administrative costs allocable toproviding health insurance coverage to the group;

providing a minimum acceptable expected profit;

totaling the group level cost forecast, expected amount ofadministrative costs, and minimum acceptable expected profit are toyield a total minimum price, and

providing a plurality of expected probabilities of retention for thegroup corresponding to a plurality of possible prices greater than orequal to the total minimum price, each possible price also having anexpected profit that is the amount of the price over the group levelcost forecast plus the expected amount of administrative costs; and

calculating a plurality of possible maximum profits by multiplying eachof the plurality of possible profits by the corresponding expectedprobability of retention, wherein the largest possible maximum profit,is used to price the group insurance.

Still another aspect of the invention is a method of underwriting aninsurance product comprising the steps of:

providing an identification of the coverage of the insurance productwhich identifies the conditions of payment under the product during apolicy period;

providing person-level health care claim information comprisingenrollment data, and base period and underwriting period claim data, theclaim data comprising claim codes having associated claim costs;

capturing the predictive ability of the person-level health care claiminformation through the application of an interaction capturingtechnique; and

forecasting a predicted cost of the insurance product during the policyperiod based on the identification of the coverage of the insuranceproduct and the captured predictive ability of the person-level healthcare claim information;

wherein each of diagnosis and CPT based risk factor is independent ofthe sequence in time of other diagnosis and CPT based risk factors.

A further aspect of the invention is a method of underwriting aninsurance, for insuring short term disability costs wherein theinteraction capturing technique uses a dependent measure from the nextperiod and policy period comprising the number of STD days in the policyperiod and weights the dependent measure by the expected cost per dayfor the STD to produce the person-level expected STD costs and summedacross the group to produce the group's expected STD cost.

A still further aspect of the invention is insuring long term disability(LTD) claims wherein a dependent measure for generating the costforecasting model is the probability of a LTD claim in the policy periodwhere the probability is weighted by the net present value of the LTDand applying the cost forecasting model to the person-level dataproduces person-level expected LTD costs wherein summing theperson-level expected LTD costs across the group to produce a group'sexpected LTD cost for an actual policy period.

A still yet further aspect of the invention is a cost forecast producedfor first-dollar health insurance.

Another aspect of the invention is a cost forecast produced for stoploss health insurance.

A still further aspect of the invention is a cost forecast produced foraggregate-only stop loss health insurance.

Still another aspect of the invention is a cost forecast produced forspecific stop loss health insurance.

Yet another aspect of the invention comprises is a cost forecast forinsuring group term life insurance costs wherein a dependent measure forgenerating the cost forecasting model is the expected probability ofdeath weighted by the amount of life insurance to produce theperson-level expected term life insurance cost.

In a still another aspect of the model development universe comprisesdata from the members of a group in the book of business to be insured.

A still yet further aspect of the invention comprises the step of:setting insurance reserves based on the renewal group-level forecast forthe actual underwriting period, wherein the next period is a reservingperiod for claims that have not occurred or that have occurred but notbeen reported.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart of an embodiment of an overview of a method forestimating future cost and optimizing pricing.

FIG. 2 is a flowchart of an embodiment of a method like that of FIG. 1which is particularly adapted for service bureau processing.

FIG. 3 is a flowchart of an embodiment of a method like that of FIG. 1which is particularly adapted for use as a software product, which maybe functionally distributed locally or over the Internet.

FIG. 4 is a more detailed flowchart of a process for data processing ofsteps 102, 202 or 302 of FIGS. 1, 2 and 3.

FIG. 5 is a more detailed flowchart illustrating a process forstandardizing time periods, for use in the methods of FIGS. 1-3, and inparticular steps 102, 202 and 302.

FIG. 6 is a flowchart illustrating data validation and standardizationprocedures for steps 102, 202 and 302 of the methods of FIGS. 1-3.

FIG. 7 is a flowchart illustrating the matching and merging(integration) of data in the process steps 102, 202 or 302 of FIGS. 1-3.

FIG. 8 is a flowchart illustrating the aggregation and risk factorcoding for the steps 102, 202 or 302 of the processes of FIGS. 1-3.

FIG. 9 is a flowchart of processing steps for developing costforecasting models based on “inlier” data in steps 106, 204, 210, 304 or310 of the methods of FIGS. 1-3.

FIG. 10 is a detailed flowchart of process steps for developing costforecasting models based on “outlier” data of the Winsorized data forthe steps 106, 204, 210, 304 or 310, of the methods of FIGS. 1-3.

FIG. 11 is a detailed flowchart for scoring, testing and integrating thedata, and adjusting for cost trends for use in steps 106, 204, 210, 304or 310 as well as 108, 208 and 306 of the methods of FIGS. 1-3.

FIG. 12 is a detailed flowchart illustrating processing steps fordeveloping group-level models and making adjustments to the summary ofthe person-level data of steps 106 and 108 of FIGS. 1, 204, 208 and 210of FIG. 2, or 304, 306 and 310 of FIG. 3.

FIG. 13 is a detailed flowchart of an embodiment of a price optimizationprocedure which may be used to carry out steps 110, 212, or 308 of FIGS.1-3.

DETAILED DESCRIPTION OF THE INVENTION

The present invention is directed to insurance systems, particularlyincluding methods for processing health insurance data to estimatefuture costs, and for optimizing pricing of health insurance products,including both first-dollar and stop loss insurance products. In variousaspects, it involves processing historical data, developing algorithms,applying those algorithms, updating those algorithms and setting prices.However, the insurance systems that can benefit from the methods andsystems disclosed herein also include, but are not limited to, healthinsurance, disability insurance, both short term and long term, as wellas term life insurance systems.

This invention comprises a series of related products that provide moreaccurate group-level claim amount forecasts (and person-level forecastsfor individual or family health insurance) and more optimal group-levelrenewal prices for insurers at full risk for the health insurance (e.g.,indemnity, PPO, HMO, POS) or aggregate only stop loss health insurancefor self insured employers. These forecasting models for renewal pricesetting are not intended to be used for paying managed care providersbut alternate related models are developed for that purpose (see B inTable 1 below). The products provide more accurate future cost estimatesby forecasting person-level costs using models that include clinicalinformation from historical health insurance claims as well asperson-level demographic and historical cost data. In this regard,effective models may be based on data from relatively large groups of atleast 50,000 people, such as typically covering an entire book ofbusiness for an insurer (or a large subclass of the insurer's book ofbusiness such as all HMO groups of the insurer) or in the case of a TPA,the TPA's entire book of business. The most recent year of person-levelmedical claim data for the individuals of a particular book of businessfor which an accurate cost forecast is desired may be processed by thismodel, to produce an accurate projected cost for policy pricing, as willbe described. Future cost trend estimates (inflation) are adjusted foreach individual's characteristics and applied to the person-levelestimates. Person-level cost forecasts are summarized to thefamily-level or group-level and family or group-level characteristicsare used to adjust the summarized cost to produce the adjusted family orgroup-level cost forecast. The price is optimized using a system thatestimates the probability of the group accepting the insurance at theprice offered, given the group's historical insurance cost, historicalclaim's history, and local competitive market conditions. Theprobability is weighted by a function of the expected future profit,which equals the anticipated price less expected medical andadministrative costs. The method and models with slight adjustments canbe applied to self insured employers aggregate only, specific only orspecific plus aggregate medical stop loss data. The products alsoinclude the use of the method applied to a client's book of business forestimating future claim amounts for purposes of setting a reserve bygroup and for cost forecasting and pricing for new groups or individualsfor fully insured health insurance. Another alternative applicationwould be the use of the method to develop and deliver products thatallow HMO's to prospectively allocate health care payments to providers.Another product is the measurement of the efficiency of health careproviders. These methods can be applied to medical claims linked tofuture short or long-term disability payments or indicators ofdisability and used to rate the relative risk of disability of groups orforecast their future costs by using the groups medical claims,enrollment data and summarized group-level or person-level disabilitypayments. Another application is to group term life insurance. Thedependent measure is the probability of death next period which islinked to medical claims in the base period and the potential riskfactors are the same potential risk factors as used with the othermodels.

The modeling strategy employed for the cost forecasting models containsseveral novel components. We have used a combination of specialized datacollection and cleaning, regression trees and regression (ordinary leastsquares or OLS, logistic and median) models tailored to a client's bookof business, and the application of these models to the client's book ofbusiness for improved decision making. While there are many publishedexamples of OLS being used for purposes similar to this application,there are a few using trees. We are not aware of any reports using acombination of regression trees and other regression models to forecasthealth care costs. The use of the output of a tree model as an input toother regression algorithms is known as “hybrid” tree models. (See D.Steinberg and N. Scott Cardell, Improving Data Mining with New HybridMethods, Salford Systems, May 27, 1998, Powerpoint@http://www.Salford-systems.com). They give examples of models with abinary (yes-no) dependent variable for which they used the regressiontree output as predictors in a regression model. They demonstrated thatthis hybrid combination was superior to either method used alone. Whenour dependent variable is cost we used OLS regression with the output ofregression trees and when the dependent variable is a probability, weused logistic regression. This allowed us to have continuous valuedpredictions rather than the step-like predictions characteristic oftrees and contingency table forecasts. Our use of the terminal nodes ofa regression tree as predictions in an OLS or logistic regression modelprovides an effective way to have both the main effects and complexinteractions of candidate predictors properly weighted in our finalmodel.

A typical group health insurance product in accordance with the presentinvention (such the various types of Blue Cross™ and Blue Shield™ brandgroup health insurance policies, which are incorporated herein byreference) comprises an identification of the types of medical expenseswhich are agreed to be covered, paid or reimbursed by an insurer to oron behalf of members of the group (including their covered dependents)which are incurred by members of the group during a future time period,typically one year, in exchange for a stated monetary insurance premiumwhich includes a forecast of said medical expenses in accordance withthe methods described herein, estimated costs of administering thehealth insurance product, and an estimated profit.

Table 1 summarizes the alternate uses of our method as applied to healthcare enrollment and claims data linked with claim amounts for firstdollar and stop loss coverage, disability coverage, reserves and termlife coverage. These alternate model development produce products thatare customized for specialized applications. Row is the application ofour invention which is presented in most detail in this application. Themethods used in A-1 are clearly related to those in each of the otherrows.

TABLE 1 Applications of the Invention's Modeling Methods AllowableSources of Candidate Risk Dependent Variable for Reference Times forPredictors Services Provided During Dep. Dependent & PredictorApplication Enrollment Data Claims Data Variable Ref. Time VariablesModel Type A. Predict Future Costs of Health Insurance 1. Renewal Groupsor All All Cost of Claims Predictor Variable Precedes ProspectiveIndividuals 2. Stop Loss: Specific All All Cost of Claims over PredictorVariable Precedes Prospective Only, Aggregate Only Deductible, overAttachment or Specific Plus Point or Both Aggregate 3. Required ReservesAll All Reserve Period IBNR Predictor Variable Precedes Prospective 4.New Groups or All None Cost of Claims Predictor Variable PrecedesProspective Individuals B. Allocate payments All Diagnosis StandardizedCosts of services Predictor Variable Prospective to health careproviders provided C. Measure “Efficiency” of All Diagnosis & selectedStandardized Costs of services Predictor Variable Retrospective careproviders mandatory procedures provided* concurrent with DependentVariable D. Short Term Disability All All + STD Claims STD days, Cost orIndex Predictor Precedes Prospective Payments E. Long Term DisabilityAll All + STD + LTD Probability LTD, Cost or Index Predictor PrecedesProspective F. Group Term Life All All + Death Probability Death, CostPredictor Precedes Prospective *Costs per service can be standardized byuse of relative values for CPT codes and DRG weights for hospital careor average actual costs for each service

Optimal pricing for a fully insured group requires an accurate forecastof the group's mean cost per person in the policy period. Optimalpricing for an aggregate only medical stop loss insurance for aself-insured employer also requires an accurate forecast of that group'smean cost per person in the policy period. Therefore, the exact samemethodology can be used for the cost forecast for fully insured groupsor for self-insured group's aggregate only stop loss insurance if thesame data are available. There is a difference in the methods used toset prices since the employer will pay for the majority of the medicalexpenses when it is self-insured and thereby paying a premium that isfar smaller than with full health insurance when the insurer pays all ofthe medical costs.

CapCost™ is an aggregate only medical stop loss product that includes asystem for making more accurate cost forecasts (for groups with 51 to3000 employees mainly). The attachment point for CapCost™ can be thestandard 125% of expected costs (called CapCost 125™) but we will offeran attachment point at 110% of expected costs (called CapCost 110™) andpossibly other attachment points. The terms of CapCost™ are similar tothose of traditional medical stop loss insurance, but there is cash flowprotection, medical costs are cumulated on an incurred basis rather thana paid basis, and there is no specific stop loss coverage. CapCost™ isuseful for employers since many will receive prices that are below theprice of traditional specific plus aggregate medical stop loss insurancewhile the maximum aggregate medical liability for the group may be lowerwith CapCost™ than with traditional specific plus aggregate medical stoploss insurance. From the insurers perspective, the expected medicalclaims it must pay with CapCost™ are frequently below those oftraditional medical stop loss products since specific stop loss coverageis not provided. Generally, CapCost™ is a better value for the employerthan traditional stop loss coverage when the employer is larger than theaverage employer purchasing stop loss coverage or if the group hasexperienced some unusually high annual medical expenses due to a fewhigh cost individuals that are unlikely to have high costs recurring inthe near future.

CapCost™ is novel in the way expected future medical costs areestimated. Historical medical claims, enrollment, benefit plan andemployer files in electronic format are collected from the Third PartyAdministrators (TPA) or insurance company that is paying the employersmedical bills. The electronic files containing the medical claims andenrollment data are collected for all people with medical coveragerather than from only those that had large claims. This invention's costforecasting models are applied to the insured people covered by theemployer. The inflation trend and optimized pricing are then applied tothe cost estimates. The CapCost™ product is a system for datacollection, cost estimates, and price optimization and is part of thisinvention. Separate products are designed for pricing new or renewalcoverage for fully insured medical plans and for allocating reserves forsuch medical plans. Each contain a system for data collection and costestimation. Price optimization is an additional part of this inventionfor fully insured medical plan renewals and stop loss coverage.

One of the important measures of the quality of a model is the meanabsolute residual (MAR). The MAR is the mean of the absolute value ofthe difference between the actual and predicted cost of a group. A lowerMAR is desirable since the predicted cost is closer to the actual cost.We compared the MAR for this invention's predicted cost with the MARcalculated using an experience model and the MAR calculated using ademographic look-up table model. The results are presented as apercentage of the mean of the groups costs or the predicted divided bythe actual times 100. The MAR was 11.6% for the invention's prediction,14.2% for the experience model, and 25.8% for the demographic model forthe 116 actual groups in our database. The invention forecast wassubstantially better than either of the two conventional forecastmethods.

We conducted a Monte Carlo simulation for groups with various numbers ofemployees since our database is too small to analyze by group size. Werandomly selected 1500 enrollees and their dependents and made 500synthetic groups. The MAR as a percentage of the groups actual cost wasabout 7% for the inventions forecast and just under 10% for theexperience forecast. A demographic forecast was not compared sincegroups with over 1500 employees and their dependents are deemedcompletely credible.

A measure of model accuracy addresses whether and by how much the modelsystematically over or under predict the actual costs for variouscharacteristics of the insured population. In order to compare thisaccuracy measure of two models, we sort the (actual) cost of groups intodeciles from the lowest 10% to the highest 10%. We calculate thepredicted (forecast) cost for the groups in each (or finer gradation)decile. The actual cost is divided by the forecast cost to make anindex. The index should be close to 1.0 if the model is accurate. In oursimulation tests (500 groups of 1500 employees), the invention'sforecast is always closer to 1.0 for every decile indicating that it isa superior model to the experience model. The invention's ratio ofpredicted to actual was about 0.91 for the lowest decile and about 1.32for the highest decile while the experience models ratios were about0.85 and about 1.55, respectively. The other deciles were closer to 1.0but the invention forecast was always closer to 1.0 than the experienceforecast.

The invention includes a general process for developing models forforecasting health care costs. The invention also includes processes forproducts that incorporate a the process and provide information forimproving specific business decisions made by health insurers,including, but not limited to, aggregate only, specific only andspecific plus aggregate stop loss health insurance products. The modelsmay be developed for specific insurers and their book of business, andmay be different for each insurer. A software listing of an embodimentof a program for carrying out a forecasting process in accordance withthe present invention is present on the above-cited CD-ROMs. Illustratedin FIG. 1 is a flowchart which represents an overview of an embodimentof a method in accordance with the present invention as applied to costforecasting and pricing of renewals for health insurance for fullyinsured groups as shown in FIG. 1.

In accordance with the method of FIG. 1, health data on members of thebook of business is collected, cleaned, integrated and aggregated, asshown in step 102. If the data are missing or miscoded, the costforecasts may be inaccurate also. Most of the programming cost andanalysis involves these phases of the process. The client's data maytypically be in many different computer systems or databases, and thedata may need to be combined to build person-level files that arecomplete for a specified time period.

A twelve month “base period” is typically used as the period from whichwe collect this data to describe each person's history of claims,diagnoses and other factors. The base period could be a longer period orshorter period and will depend on how long the groups have been enrolledand the time for which adequate computer or other records are kept. Thebase period may have different time periods for people and groups thatdo not have the same enrollment renewal dates.

There is typically a period between the “base period” (or underwritingperiod) and the “next period” (or policy period) during which medicalclaims data are not available, since they were incurred but not reportedor they are between the time of the price quote for policy period'srenewal and the renewal date. We call this the “lag period”. Theexamples here use a lag period of three months but that could be alonger or shorter time period depending on the needs and constraints ofthe available data, the insurer or others.

The “next period” is typically the period of twelve months of insurancecoverage immediately following the lag period. The claim amount forecastperiod is the next or policy period that is priced for the group. The“next period” is the relevant time period for the dependent variable inthe cost forecast models.

If the insurer for which future health costs are to be forecast (e.g., abusiness entity which desires to provide health insurance) is a newclient, (e.g., has not had models previously built on their book ofbusiness) then a new cost forecasting model may need to be developed forthem, for example, as shown in step 104 of FIG. 1. An alternative is touse existing forecasting models and recalibrate those models to the newor updated data. Our methods include a systematic process to develop newmodels or recalibrate old models. A new model is developed when the olddatabase upon which the old model was developed is not representative ofthe new database. This might occur if the new database is substantiallydifferent in size, covers a different geographic region, containsdifferent types of insurees (e.g., predominantly elderly in Medicare;pregnancy and children are characteristic of Medicaid) or differenttypes of payments (e.g., capitation payments plus fee for servicepayments).

The selection of the population to be modeled is of key importance sincethe predictor variables and their weights will reflect not only thespecific needs of the population, but also the practice patterns ofthose providing care and the prices charged for its health careservices. The ideal population to use as a standard is the CI's book ofbusiness for which the forecasts are needed, provided it is ofsufficient size. We have found that an insured population (i.e., book ofbusiness) as small as 50,000 persons can produce robust cost forecasts.

Use of another, smaller or less representative population as a standardcan cause problems in both the selection of risk factors because thereis no reason to believe that needs per person even after adjustment fordemographic factors, nor practice patterns of providers, nor prices perservice will be similar enough in the index population as what amountsto a convenience sample, no matter how large the latter may be. Thethree cost component factors are known to vary from geographic locale bysocioeconomic status of the insured and the characteristics of theproviders and the features of their health insurance.

As shown in step 106, if it is determined that a new cost forecastingmodel should be developed, there is a specified process for developingthe model. The method for developing the new cost forecasting model ispart of our product and it can be applied to any medical insurancedatabase that includes the necessary information.

To develop a new cost forecasting model for a specific customer, we needdata from groups that were in its historical “base period” and “nextperiod”. Claims data from the “lag period” are not necessary since itneed not be used in the model but it is generally collected. The costforecasting model is calibrated on the historical data to model thedynamics of medical care, practice patterns, and pricing in thegeographic markets and provider networks used by the customer. Thegroups of insured people used as a standard in our models must beenrolled for at least the last day of the “base period”, for the entirelag period and the next period. Multiple sets of base period, lag periodand next period can be used to increase the amount of data used tocreate the cost forecasting model. More data produces more robustmodels, but must be adjusted for secular cost trends when there aremultiple calendar years for the “base period”.

Scoring the data for pricing insurance for the policy period involvesapplying the forecasting model to the data for the underwriting periodthat will be used to forecast cost for the policy period—the renewalyear that needs pricing, as shown in processing block 108. Generally,the most recent nine months of the previous next period will be in thenew underwriting period offset by the three month lag period. This helpsin processing the data needed for predicting future costs. The firststep in the scoring 108 is applying the data steps to the newunderwriting period that have not been previously applied (e.g., codingof risk factors). Second, the cost forecasting model is applied to theperson-level data. External health care inflation forecasts from the CIor consulting organization are then used to adjust the prior year'strend inherent in the person-level forecasts. The person-level inflationadjusted cost forecasts are then aggregated to the group-level. Third,group-level adjustments to the forecasts are applied for benefit plandesign, SIC code, and other factors influencing group costs.

Having forecast the group's future medical expenses, over the selected(e.g., 1 year) period, the price to be charged for the medical insurancefor the group for that period may be determined, as shown in block 110of FIG. 1. The insurer generally desires to obtain a fair, or evenmaximum profit, without causing the group to leave for another insurer.The competitiveness of the market, historical prices, and historicalcosts are all factors that will influence the likelihood of the groupbeing retained at any given price. The policy premium, the price to becharged to the customer for the medical insurance coverage for thespecific group, comprises the forecast medical cost, the insurer'soverhead and other business expenses, and a projected profit. Theclient's underwriter(s) are asked to provide explicit probabilities ofretaining a group at various price increases. These probabilities aremultiplied by the expected profit if the group is retained, resulting inthe expected profit for that group at each price increase. Theinformation is presented to the underwriter with the premium price thatoptimizes profit highlighted and recommended. These recommended pricesmay be more or less than prior prices, but will typically moreaccurately reflect the future medical costs of the specific group.

FIGS. 2 and 3 similarly provide an overview of the information flows fortwo different embodiments. The embodiment of FIG. 2 involvessubstantially only the transfer of data. The embodiment of FIG. 3involves installing software at the client or an Internet connectionwith the client's software.

Shown in FIG. 2 is a “service bureau” embodiment in which all of thedata preparation, cost forecasting, model development, scoring the data,and pricing for specific individual groups is carried out at a servicebureau location. As shown in block 202, medical history and claims datafor members of the group are sent to the service bureau location, and acost forecast or per group price or both are sent back to the client(see 212). An alternative is for software to be installed in theclient's (insurance company's or third party administrator's) operationswith model updates being periodically provided to the client.

This historical data (typically provided by an insurance company or TPA)is used to develop a model that is calibrated to the book of business(see the sample data requested of the client, and/or for specific policytypes of insurance companies). A base period, lag period, and nextperiod are required as a minimum. The data are fully validated prior tothe model development.

As shown in block 204, cost forecasting models are developed whichinclude person-level inlier models based on the Winsorized data (seeFIG. 9) and outlier cost components (see FIG. 10), inflation adjustments(see FIG. 11), group-level attribute models (see FIG. 12), and pricingmodels (see FIG. 13).

As shown in block 206, once those models are developed and preferablyfully tested, we are ready to work with the most recent data availableto score the data as shown in block 208 and establish cost forecasts andset prices for upcoming medical insurance coverage. The most recent dataare sent to us for validation, scoring, future cost estimation, costtrend adjustments and pricing (blocks 206 and 208). The data submissionis done approximately on a monthly or quarterly basis. There is atrade-off between getting the most recent claims data available forpricing and the effort required to validate the data submitted at ahigher frequency and shorter intervals.

The data are stored and combined with the previous data submission untilthree to six months of new data are available, as shown in block 210.The new data are combined with the most recent data from the previousdata submission so that the most recent 12 months of data are availableand are used as the updated next period for recalibration of the modelsto be used for scoring other groups. In other words, the old models arerefit with the new data and updated cost trends are included also. Everyone to two years the models may be revised with updated predictorvariables and weights. Redoing the models will help capture changes inpractice patterns and relative pricing.

As shown in block 212, the summarized cost forecast and pricinginformation are sent to the client for use by underwriters or in anautomated quotation system. The insurance company or other underwriterclient may also use its own pricing algorithm using the cost forecastproduced by the method of FIG. 2.

As indicated, FIG. 3 similarly illustrates an overview of an embodimentof the present invention which may be directly utilized by a healthinsurer or medical underwriter.

As shown in blocks 305, 306, and 308, the various parts of operationalsoftware and work flows of the client database may be adapted toautomatically extract data, validate it, score the data with theforecasting models, and price the groups. The medical history, cost andother data elements used, and timing of the data extracts are normalizedor standardized for utilization in the method and automating therecalibration of the models as shown in block 310. An alternative toinstalling the software on the client's computers is to perform thattask using the Internet (as an Internet Service Provider or ISP) toextract the data and return cost forecast and group prices to theclient.

As shown in block 306, processing software modules for carrying out thepresent method may be installed on client computers, to utilize thestandardized data for the software.

As shown in block 308, after determining the medical cost forecast for aspecific group, the prices are offered to that group for renewed medicalinsurance, whether it be first-dollar, stop-loss or other coverage. Thiscan be done using a human underwriter or as part of an automatedquotation system.

The software will capture the updated data and combine it, as shown inblock 310. Those data will be used to recalibrate the models after aboutthree to six months of data accumulation. The updating may be performedoffline, or may include automatic database updating and modelrecalibration. Completely new models may be developed about every one totwo years offline.

Having described an overview of several embodiments as illustrated inFIGS. 1-3, various processing steps of the illustrated methods will nowbe described in more detail.

402 The first step in the data portion of the process is the datarequest. We do not need to have data in a predetermined layout orformat. Some variables may not be available for a given CI, TPA or otherdata provider. This process is flexible so that it can be modified towork around alternative formats and data sets used to formulate thecandidate predictor variables. However, the dollar value of claims madein the base period and claims paid (or disability or life indicatorratios) in the next period are essential. Enough time for run out ofclaims is necessary so that incurred but not reported (IBNR) claims areincluded in the data. The following is an example of a data formats,which may be used as a request for health and medical cost data to beused in the forecasting of medical costs:

EXAMPLE DATA REQUEST

In a preferred embodiment, this data may preferably be in the form offive different data files that are linked by an encrypted identifier.The identifier should include unique characters for the company, family,and person. The data files should include group-level information,person-level information, detailed medical claims information (e.g.,hospital, physician, durable medical equipment, home health, etc.),detailed pharmacy claims and capitation information, if germane.

Preferably, data for a relatively large number, e.g., 500,000 people,covering 27 consecutive months (12 month base, 3 month lag, and 12 monthtest periods).

Descriptions of preferred data are as follows. Some of these variablesmay not be readily available, especially some of the group-levelvariables, and accordingly would not be used in the model building andmedical cost forecasting. Other data which may define useful variablesmay also be included.

1. Group-level data (for any group covered during the test period)

a. Company identifier

b. Group location (zip code or state and county codes)

c. Benefit plan description (format and content TBD)

d. SIC code or other industry classification

e. Original group effective date

f. Employer and Employee premium contribution %

g. Total number of covered employees on date last renewed or date lapsed

h. Next scheduled renewal date

i. % employee participation

j. Capitation payments by provider type by geographic locale

2. Enrollment data (person-level for each person covered above)

a. Company identifier

b. Person identifier

c. Age and birth date

d. Sex

e. Relationship to employee

f. Status of employee (e.g., COBRA, pensioner)

g. Employee type (e.g., hourly)

h. Zip code of residence

i. Date of enrollment

j. Date of termination during study period, if any

k. Presence of other health insurance (e.g., spouse coverage, Medicare)

l. Salary or wage

m. Amount of term life coverage

n. Amount and terms of disability coverage

3. Medical claims (claim-level)

a. Person/company identifier

b. Service line-level information:

-   -   i. Billed charges, covered charges, payments, amounts applied to        deductibles, coinsurance, co-pays, and out-of-network penalties,        amounts of COB, pre-existing, capitation payments and other        cutbacks    -   ii. Dates-incurred, entered, and paid    -   iii. Array of ICD-9 diagnoses (5+) for each claim    -   iv. CPT code for each claim    -   v. Provider type (e.g., physical therapist, clinical        psychologist, cardiologist)    -   vi. For confinement in any sort of inpatient facility, include        partial bills, DRG for inpatient hospital, admission and        discharge dates, partial/final bill indicator    -   vii. Service type/location (e.g., ER, surgicenter, home)    -   viii. Amount of subrogation    -   ix. Type of payment (e.g., fee for service or capitation)        4. Pharmacy data (claim-level)

a. Person/company identifier

b. National Drug Code or other classification

c. Date of prescription

d. Number of units, dose of units, and number of units/day (ifavailable)

e. Billed charges, discounted charges, and payments

5. Capitation payments, if germane

a. Geographic locale or market

b. Provider type

c. Amount and dates

d. Method for payment (e.g., per member per month)

The models can be built without pharmacy data if that is not covered bythe insurance. Enrollment and medical claims data are required. Many ofthe group-level variables are desirable, but optional. The data formatwould specify the dates for the beginning of the base period and the endof the next period or new base period to be used for the cost forecastfor pricing. Because the data may originate from a variety of differentdatabases and sources, control totals (e.g., number of records, sums offields) are also included, to assure that the data is excerpted andformatted properly. The customer or TPA may provide a layout or formatfor the data, because a specific format is not required. The layout orother documentation should, however, describe all of the legitimatevalues for the variables and the meaning of those values (e.g., providertype=3=physician).

As shown in block 404 of FIG. 4, the customer or TPA sends a layout anda sample database, so that tests can be run prior to extracting all ofthe data. Valid ranges of variables are checked as shown in block 406.Control totals are matched, and encrypted IDs may be tested. The dataneed not be aggregated and tested since it is a small subset of the datauniverse, but the conformity of the sample data to the layout ischecked.

If the database is accurate, the entire universe of data is processed,as shown in block 408.

If the database and layout do not correspond or there are data valuesoutside of the range of legitimate values, the data extraction programor layout are fixed and another sample data set or layout is tested.

The dates for the model development overall, and the base period foractual cost forecasting and pricing are established and defined, and therespective dates for each respective group have been set prior to thedata request. Now the dates for each group must be determined for itsinclusion in the universe of the model development.

As shown in FIG. 5, the process perhaps is easiest to understand byworking it backwards. A list is developed for the renewal dates for thefirst year of coverage that would have prospective prices set using thismethod, as shown in block 502.

The following Table 2 lists an example of time sequencing for developingmodels and implementing cost predictor models.

TABLE 2 Time Sequences for preparing and Implementing Cost PredictionModels B^(a) Number Model Implementation for of Consecutive A PredictingCosts and Setting Calendar Months Model Development Prospective Prices12 Base Period Data Underwriting Period Data  3^(b) Lag in Data ForecastCost, Incorporate 12 Next period Inflation Forecast and Set  3^(b) ModelWeight (re) Premium calibration 12 Policy Period ^(a)Column B pertainsto Groups which have the same renewal data (e.g., January 1) ^(b)Periodsgreater than 3 months, may be required for these phases depending onclients needs

The groups need to get a price in advance of the coverage date for newcustomers, or the renewal date for existing customers, to accept orreject it prior to the renewal coverage. Additionally, time forreceiving data from the client or a TPA and analyzing it must be addedto the lag period. We have used a three month lag period, may be used inprocessing block 504, but it could be longer or shorter depending ondatabase and business needs.

As shown in block 506, the beginning of the lag period is the last datethat bills can be paid for the base period of the model developmentperiod. Otherwise, the cost forecasting model would include informationthat would not be available in the future. The lag period information(claims paid or made) need not be used to provide an accurate costforecast for a future time period for a particular group. The claimsincurred during the next period is the dependent variable for the modelof the illustrated embodiment. An estimate of claims incurred but notreported may be added on if there is insufficient time for a properrun-out period (i.e., if only one base period and next period are usedfor model development). The lag period precedes the next period and thebase period is typically the year preceding the beginning of the lagperiod in the universe of model development.

Table 2 illustrates one example of timing for the processing of block508. Column A represents the model development period and Column Brepresents timing for the application of cost forecasting andprospective pricing. The model development time period precedes theactual pricing period but there is overlap since the next period of themodel development period is used as part of the underwriting period forthe application of cost forecasting and the pricing model. The timelinewill be modified when longer lag periods are required. Column B pertainsto groups with the same renewal date. Alternate flowcharts may be usedto represent each renewal date.

Illustrated in FIG. 6 is a flowchart illustrating data validation andstandardization procedures for steps 102, 202 and 302 of the methods ofFIGS. 1-3.

Preliminary data validation checks, and initial data preparation as asecond set of data checks, as shown in block 602. Utilizing a filestructure that will allow for standards to be compared to the data priorto the data aggregation is a facilitating procedure.

As shown for processing by block 604, medical claims include diagnosesthat are typically coded in ICD-9-CM codes, procedures that are coded inCPT codes, prescriptions that are coded using NDC codes,hospitalizations coded using DRGs, ICD-9-CM and other codes, that mayappear on claims. Tables are developed that contain the values for allof these codes. These tables are standards for comparison with thecustomer's data and the values in the data must correspond to validvalues for these coding systems.

As shown in block 606, tables are made for each client, because theplace of service, type of provider, dates, and other fields on theclaims and enrollment data will frequently have values that areidiosyncratic to a particular database or customer.

The values should preferably be put in a table format that will allowchecking and standardizing the data for accuracy, as shown in block 606.

As shown in 608, the time periods at the group-level (see TABLE 2) maybe used to screen if claims and insureds should be in the universe. Atable is used for comparison. Prior experience permits the developmentof norms that can be used to check the data for reasonableness. Examplesinclude the charge and payment per claim, the number of claims perperson, and other norms. These values are put into a table forcomparison, and processing in block 610.

Preparation (see block 612) of the raw data involves the same dataprocess steps used in FIG. 4, utilizing specified read programs.

The data (see block 614) are provided in the agreed upon medium, thedata are read and control totals are checked, see block 616. If errorsare noted, the cause is determined and corrected.

The raw data are reformatted, see 618, into a SAS database in theillustrated embodiment. Other database software (e.g., SPSS, Oracle,etc.) could be used which are also capable of handling large scaledatabases.

In subsequent process steps as shown in FIG. 6, the fields arereformatted (see 620) so that the values correspond to the standardtables, the group-level time period (see TABLE 2) tables are used toextract, see 622, the universe of relevant claims and insured people,and claims for people that are not in the model development universe areput into a separate file (see 624). Data following the model developmentuniverse time period may fit into the underwriting period data that willbe used for the application of cost forecasting and pricing.

The claims and enrollment data from the model development universe arecompared, (see 626) to the standards. A decision is made, see 628,whether the data are in compliance with the standards.

Data that do not match the standards are put, see 630, into a separatefile. The cause of the mismatches is evaluated, and the data is deletedor corrected where appropriate. Records may need to be sent back to thecustomer for replacement or fixing. If there is a large number ofmismatches, they must be fixed prior to aggregation.

The records that match the standards need to be matched and merged, see632, into person-level summaries. Incomplete data should not beaggregated as it will be misleading.

FIG. 7 is a flowchart illustrating the matching and merging(integration) of data in the process steps 102, 202 or 302 of FIGS. 1-3.

In order to match and merge the enrollment and claims data, there needsto be a unique group, family within group, and enrollee or dependentwithin family identifier, as indicated in the processing of block 702.The social security number or other identifier is encrypted so thatactual people cannot be identified and group numbers are used instead ofcompany names. Street addresses are not used so the people cannot bepersonally identified. However, records need to be linked for accuratemodels and pricing. One linking system that is effective uses the groupID as a prefix, encrypted social security number of the enrollee as thefamily ID, and enrollee or dependent number as the person ID. Birthdates and sex are useful as checks on the ID.

As shown in processing blocks 704, 706, the claims data are preparedseparately, and a look-up table is generated that lists the group,family, person ID for all claims with the respective birth date and sex.

In accordance with processing blocks 708, 710, the enrollment data areused to develop a separate enrollment look-up table which contains thesame information as the claims look-up table. There will be more in theenrollment table since each person in the group does not necessarilyhave a claim but should be in the enrollment file.

The processing for the respective blocks of FIG. 7 are described asfollows:

712 The tables are merged and compared. The claims table should be asubset of the enrollment table. Claim IDs that do not match enrollmentIDs indicate an error. These claims are put into a separate file andmanually analyzed.

714 The claims records that match enrollment records are merged togetherinto one long variable length record.

716 The person-level merged file contains the enrollment information andclaim information, but the record is not aggregated.

718 A flag is assigned to people that have claims and enrollmentinformation since these records will require aggregation.

720 A flag is assigned to people that do not have any claims since theirrecord does not require aggregation.

722 Additional data validation checks occur such as the number ofinsureds per group and the percentage of people within each group thathave no claims.

724 If there are aberrations in the data, there is a manual review. Ifthat does not fix the problem, the errors are reviewed with thecustomer.

726 The data are valid and ready to transform into the analyticdatabase.

FIG. 8 is a flowchart illustrating the aggregation and risk factorcoding for the steps 102, 202 or 302 of the processes of FIGS. 1-3. Therespective processing blocks of FIG. 8 are described as follows:

802 The claims data are sorted by person ID by incurred date of theclaim.

803 This sort allows for a final screening on the chronologicaleligibility. A person in the group typically needs to have at least oneday of eligibility in the base period and next period and continuouseligibility between those dates. Otherwise, they are dropped from themodeling database. If a person loses eligibility prior to next period,he or she is dropped from the entire analytic database. If the personenrolls in the lag period, that person is kept in a separate analyticdatabase. This last category of people will have their next periodpayments compared to those of similar demographics. If a person isenrolled in the base period and disenrolls during the next period, thosepeople are put into a separate file in the analytic database. Their nextperiod payments will be compared to people with the same characteristicsthat did not leave in the next period. People in other time sequencesmay be dropped from the analytic database.

804 A new record is produced for each person. It includes the enrollmentdata and information extracted from the claims, when available. The riskfactors use ICD-9-CM codes, CPT codes, place of service, provider type,demographic data, and other variables (see risk factor listing inAppendix G). As the records for a person are read, the ICD-9-CMdiagnosis codes, CPT codes and other variables that are used to definethe risk factors are extracted from the claim records. The new record isa vector of variables that are initialized to zero and then incrementedby one when that variable is read in the claims. These variables arecoded from claims from the base period only. Payments and charges aresummed for the base period, lag period, and next period. It is importantto compare the expected cost from the forecasting model with the actualcost next period of those that were not in the modeling universe. Ifthere are large discrepancies, the model may need adjustment.

806 The risk factors are then coded by processing the information oneach person's aggregated record (See Appendix G). Risk factors weredeveloped using a combination of expert medical opinion, statisticalanalyses, and knowledge of the medical insurance market. Diagnoses aredivided into diseases and conditions and by inherent risk. Proceduresare divided by body system, type of test, type of procedure, and typeand site of care. Other risk factors are designed based on therelationship to the enrollee, family composition and demographics. Thereis a trade off between a very specific risk factor that has very few butvery homogeneous people in it and broad risk factors that haveheterogeneous people in it. Correlations with the next periods paymentsand regression models are two ways to determine if a risk factor isworthwhile empirically. The base period charges and payments plus theshape of relative amounts of those payments by month, day, or otheramount of time are some of the strongest risk factors (See TABLE 4). Theamount of time enrolled in the base period is another risk factor. Thekey is developing robust risk factors that are not too heterogeneous. Apriori logic plus trial and error are useful approaches. Our candidaterisk factor codes are listed in Appendix G. TABLE 5 illustrates twofamily composition risk factors. A detailed listing of risk factors iscontained in Appendix G: Risk Factors.

TABLE 4 Risk Factors for person level experience Hibymos1 The maximumcost per day for any month cost for the base period Hibymos2 The 2^(nd)Highest cost per day for any month for the base period Hibych2a (1, 0) 1= The second highest month cost per day is adjacent to the highest monthHibych2b (1, 0) 1 = The second highest month cost per day is notadjacent to the highest month Hi1dvby The index of Highest cost per daydivided by average cost per day per month Hi2dvby The index of 2^(nd)highest cost per day divide by average cost per day per month TenmochAverage from the sum of all months in the base period excluding the 2highest months per day

TABLE 5 Risk Factors - Family Composition Ensxkd Combines the use ofEmployee Relationship: ‘1’ = ‘A Enrollee’ ‘2’ = ‘B Spouse’ ‘3’ = ‘C Son’‘4’ = ‘D Daughter’ ‘5’ = ‘E Stepson’ ‘6’ = ‘F Stepdaughter’ ‘7’ = ‘GOther Male’ ‘8’ = ‘H Other Female’ ‘9’ = ‘I Surv Spouse’ and Gender ofEnrollee: ‘M’ = “male’ ‘F’ = “female” values for ensxkd: 1 Enrollee,Male 2 Enrollee, Female 3 Spouse, Male 4 Spouse, Female 5 Son, daughter,Stepson or Stepdaughter 6 Other Female or Surviving Spouse kid1_3 Countof the Number of Children in a family. 0 = no children, 1, 2 or 3 ormore children

Some insurance plans are paid on the basis of a combination of fee forservice (FFS) payments and capitation payments. The previous discussionhas assumed a FFS payment system. If the combination or hybrid paymentsystem is used, then adjustments for capitation payments must be made atthe person and group levels. We recommend developing risk factors asdummy variables when there are capitation payments for a particularprovider types (e.g., primary care, obgyn). This is especially importantwhen the capitation coverage is not consistent across groups orgeographic region.

808 Validation checks can now be made on person-level data. Frequencycounts for dichotomous or categorical variables are prepared andcompared among groups, geographic area, time period, as well as againstnorms. Missing value percentages are calculated by group, time periodand geographic area for each risk factor. The mean number of claims perday and mean dollars per claim (this can be Winsorized) are calculatedby group, time period and geographic region. Large discrepancies in thenumber or average claim size is reviewed and analyzed to uncover dataerrors. The ratio of charges to payments is calculated by group, timeperiod, and geographic region and compared with norms.

810 Aberrant results are evaluated to determine if there is an error. Ifdata cannot be corrected or replaced, those people are dropped from themodel universe.

812 The model universe is left and ready for final preparation foranalysis.

FIG. 9 is a flowchart of processing steps for developing costforecasting models based on “inlier” data in steps 106, 204, 210, 304 or310 of the methods of FIGS. 1-3. Processing blocks of FIG. 9 aredescribed as follows:

901 A clean analytic database is required as the modeling universe.Otherwise, spurious results will lead to idiosyncratic, non-reliablemodels or, at best, weakly predictive models.

902 The modeling universe database is separated into Winsorized data(i.e., inliers) and the outlier data. There is an “inlier” model withthe dependent variable Winsorized and an “outlier” model that uses thedifference between the actual claims next period and their Winsorizedvalues. The independent variables are similar for the inliers andoutliers. It has been found that models are more accurate when averagepayments per day is used as the dependent variable and average chargesper day as predictor variables (and components of it such as the lowestten months average charge per day). Cost per day adjusts for persons notenrolled for a complete year.

The Winsorization point is typically selected as the top 5% of paymentsper day. If that value is $55 per day, then the inlier model uses avalue of $55 per day as the dependent variable for people with greaterthan or equal to $55 per day in payments. People with under $55 per dayin payments do not have their dependent variable changed.

The database for the outlier models flags people with next periodpayments greater than or equal to the Winsorization value (e.g., $55 perday). If they are at or over the Winsorization amount, the flag equalsone and zero otherwise. Also, the actual payments per day next periodless the Winsorization amount is calculated. If it is negative, theoutlier payment is set to zero.

903 The Winsorized modeling universe database is separated into twoseparate components: those individuals with claims in the base periodand those individuals without claims in the base period. Those withoutclaims have only demographic risk factors whereas those people withclaims have a payment history and clinical information as additionalrisk factors. Those without claims are on average lower in risk thanthose with claims.

904 The no claims database includes demographic variables, such as ageand the family relationship to the enrollee plus risk factors from theenrollment file.

906 People with claims in the base period also have the enrollment filerisk factors plus those risk factors derived from the claims file.

An example of a program segment to run OLS regression model on inlierwith claims data is as follows:

*** ‘5^(th) root of winsorized cost is DEP measure ’; ***OLS MODEL; proc reg data=‘DATA WITH CLAIMS’ outest=’OLS 1^(st) MODEL  FOR LADCART’;   exp9olsd : model w5_6850= ensagen sq5chg1 sq5chg2a sq5chg2bsq5oth agesq h5bchg1 h5bchg2a h5bchg2b ten5moch zeroa zerob zeroothenrldayb hibymos1 hibymos2 hi1dvby hi2dvby     / selection=stepwiseselection=backward details; run;  proc score data=‘DATA WITHCLAIMS’ score=’OLS 1^(st)  MODEL FOR LAD CART’      out=‘DATA WITHCLAIMS’ type=PARMS predict;   var ensagen sq5chg1 sq5chg2a sq5chg2bsq5oth agesq h5bchg1 h5bchg2a h5bchg2b ten5moch zeroa zerob zeroothenrldayb hibymos1 hibymos2 hi1dvby hi2dvby; run; ***CHECK RESULTS; procmeans data=‘DATA WITH CLAIMS’ ; class modeled; var w5_6850 exp9olsd ;proc corr data=‘DATA WITH CLAIMS’ ; var w5_6850 exp9olsd ; where modeledeq ‘YES’;

908 The initial person-level model for people with claims uses thecontinuous independent variables only. Examples include the age, numberof days enrolled in the base period, charges in the peak spending month,and average charge per day in the lowest ten months. The dependentvariable is the Winsorized payment per day (or a transformation of itsuch as the fifth root) in the next period. An ordinary least squares(OLS) model has been used. Other forms of regression models (e.g.,median or robust) or neural networks could be used. The example given inthe software in the CD-ROM Appendix does not include this step, but theprogram above does provide an example. This step can be important whenthere are several numerical candidate predictor variables.

910 The expected payments per day from the previous step is used as aninput to the next model along with the categorical variables (e.g., sex,site of care, diagnosis, etc.) We have found that a regression tree is avery effective method for capturing the interactions between theclinical variables and the amount charged in the base period. The CARTsoftware with the median regression tree option has produced the bestresults to date. Other forms of data mining (e.g., rule induction,clustering, F genetic algorithms, neural networks) could also be used.The key is to capture the interactions between base period charges andboth clinical and demographic risk factors. An example of a Program torun CART median regression tree using expectations created from OLSregression (see 910) and other risk factors is found in Appendix A.

912 A CART median regression tree or other data mining technique is usedto model the “no claims” Winsorized database. The first model (i.e., theone for continuous variables used in 908) is omitted since none of thecontinuous variables derived from claims are available for this universeother than age or length of enrollment. This model uses the samestatistical techniques as 910 but its independent variables are limitedto those that can be derived from the enrollment file. The output fromthe regression tree (terminal nodes) identifies groupings of people thathave homogeneous next period payments.

914 The regression tree terminal node's groups people with similarmedian payments next period. A set of dummy variables is developed thatidentify people in each terminal node. These dummy variables, thevariables that were used to form the dummy variables, and thesignificant variables from 908 are entered into a final predictionmodel. We have used OLS, but other techniques, such as median or robustregression, neural networks or other modeling methods could be usedinstead. The result of those models is an expected payment per personper day in the next period. This only includes the Winsorized portion ofthe payments for people with claims in the base period. An example of aprogram to run OLS regression using terminal nodes from regression treeand other important risk factors from the tree (see 910 and Appendix A)is found in Appendix B.

916 The same technique as 914 is applied to the model output from 912.The result of this model is the expected payments per day for nextperiod for people that do not have claims in the base period.

918 Model testing can be done at this point or after each step in themodeling process (i.e., after 908, 910, and 914 for models for peoplewith claims). It is probably more efficient done after the final step.There are five criteria that are used in model evaluation in theillustrated embodiment: the mean absolute residual, r², accuracy measure(previously defined), bias, and cross validation. Mean absoluteresidual, accuracy measure (previously defined) and r² are related tothe accuracy of the forecast. Bias refers to systematic over or underprediction when cases are sorted by their expected value. Regressionmodels can be biased but regression trees are not biased. Crossvalidation refers to the accuracy of the models when they are applied todifferent sets of data. The tree software tests for cross validation.Hold-out samples can be used for testing the entire hybrid models. Anexample of a Program to run bias test, mean absolute residual, and r²analyses (examples of model testing) is found in Appendix C.

920 The same tests of the quality of the models are applied to themodels developed on people without claims in the base period. The modeltests are probably most efficiently applied after the final model isdeveloped (i.e., 920). These models will have far less predictiveaccuracy than the models covering people with base period claims sincethere are fewer risk factors and the variability in next periodspayments is not very predictable.

FIG. 10 is a detailed flowchart of process steps for developing costforecasting models based on “outlier” data of the Winsorized data forthe steps 106, 204, 210, 304 or 310, of the methods of FIGS. 1-3. Theillustrated processing blocks of FIG. 10 are described as follows:

1002 The outlier database has next period's payments of zero foreverybody whose payments were below the Winsorization point and theamount above the Winsorization point for everybody else. The outlierscan have very high cost per day so the variability is very large.Therefore, we have chosen to model the outlier portion separately. Thistwo step approach leads to more accurate and stable results since theextreme outliers are almost impossible to predict accurately.

1004 People with base period claims are modeled separately as they haverisk factors not available with people without base period claims (e.g.,diagnosis and amount charged).

1006 People with no base period claims are modeled separately since theyonly have risk factors available from the enrollment file.

1008 The same continuous risk factors available for 908 are used tomodel the probability of these people having payments above theWinsorization point. The dependent variable is 1 if the total amount ofnext period's payment is above the Winsorization point or zerootherwise. A logistic regression is used to estimate the probability ofeach person's next period's payments exceeding the Winsorization point.Other types of regression models (median or robust), neural networks, orother predictive modeling can be used instead of logistic regressions.

A program to run logistic regression probability model on outliers withclaims follows.

**HILO is the 1=Outlier, 0=Inlier;  proc logistic data=‘DATA WITHCLAIMS’ outest=’LOGISTIC  WEIGHTS’;   exphilo : model HILO=ensagen        sq5chg1 sq5chg2a sq5chg2b sq5oth agesq         h5bchg1 h5bchg2ah5bchg2b ten5moch         zeroa zerob zerooth enrldayb         hibymos1hibymos2 hi1dvby hi2dvby; ;  proc score data=‘DATA WITH CLAIMS’score=’LOGISTIC WEIGHTS’      out=‘DATA WITH CLAIMS’ type=PARMS predict;  var ensagen sq5chg1 sq5chg2a sq5chg2b sq5oth agesq   h5bchg1 h5bchg2ah5bchg2b ten5moch   zeroa zerob zerooth enrldayb   hibymos1 hibymos2hi1dvby hi2dvby ;    run; run; data ‘DATA WITH CLAIMS’; set ‘DATA WITHCLAIMS’; exphilo=exhbilo*’mean of outliers’;

1010 The model is tested for accuracy using the criteria described in918. Note that the probability of each person being an outlier is beingmodeled rather than classifying each person as an outlier or not anoutlier. All of the techniques from processing block 918 of FIG. 9 areapplicable.

1012 A regression tree is used to refine the estimated probability ofbeing an outlier. The dependent variable is the same as 1008. Werecommend a least square regression tree but other types of predictivemodels could be used that capture interactions (e.g., neural network,rule induction or genetic algorithms). The expected value from thelogistic regression plus all of the categorical risk factors from theclaims data and enrollment file are used as candidate independentvariables (See 910). The output are terminal nodes of a least squaresregression tree that have homogeneous probabilities of being an outlier.The probability of each person is determined by their terminal nodes.Note that this is not a classification tree.

A program to run CART least squares probability tree on outlier withclaims data using expectations from OLS regression (see 1008) and otherrisk factors is found in Appendix D.

1014 The same methods are applied to the people with no claims data (See1012). The output are groupings of people with homogeneous probabilitiesof being an outlier.

1016 and 1018 The models are tested for accuracy, bias and crossvalidation as the models were tested in 918.

1017 and 1019 The terminal nodes and risk factors defining thoseterminal nodes are used as input into another logistic regression orother forecasting technique (see 914 and 916). The examples in AppendixE are for 1017 since it includes data from claims.

1020 and 1022 For each terminal node, the median payment above theWinsorization point next period is calculated. When the medians are notsignificantly different, the terminal nodes (mean above theWinsorization point) are combined for additional stability. Note thatthe probabilities are not combined. The means are calculatedarithmetically for the people in the combined terminal nodes and forthose kept in separate nodes due to their distinctive median dollarcosts. The means are then multiplied by the respective probabilities foreach person giving the expected outlier payments for each person. Theprobability from the logistic regression (see 1017 and 1019) is usedrather than from the regression tree. People are “tagged” with theirrespective terminal nodes (see 1012 and 1014) so that the correct meanis multiplied by the probability.

1024 The inlier Winsorized cost forecast and the expected cost of theoutlier portion are summed to give the total expected cost for nextperiod.

The process of scoring the data refers to applying the model to a set ofdata. The data need not be the same data on which the model wasdeveloped. However, it is best if the weights are derived from thatclient's book of business. The data need to have the same risk factorscoded on it that were included in the models of the probability of beingan outlier and those used for the expected inlier payment calculations.Also, the models must be applied to the universe of people that weredefined using the same criteria that were used to define the modeluniverse. The model gives a set of weights applied to individual riskfactors or combinations of risk factors yielding the expected paymentsor probability. Most statistical packages or data mining software haveautomated methods for scoring data once the risk factors are properlycoded.

Illustrated in FIG. 11 is a detailed flowchart for scoring, testing andintegrating the data, and adjusting for cost trends for use in steps106, 204, 210, 304 or 310 as well as 108, 208 and 306 of the methods ofFIGS. 1-3. The description is written as steps in developing the modelso the data are referred to as the base and next periods. Theapplication of the model to the actual underwriting data is essentiallythe same and it produces the policy period expected cost. The respectiveprocessing blocks of FIG. 11 are described as follows:

1102 The probability of a person being an outlier (i.e., with policyperiod payments greater than the Winsorization point) is calculated forall people without claims. Their probabilities will be lower than thosewith base period claims.

1104 The mean for each terminal node or group of terminal nodes (block1022 of FIG. 10) is multiplied by the associated probability. Thiscalculates the amount over the Winsorization point that each person isexpected to cost in the next period. This gives the expected outlierdollars per day for each person. The mean expected dollars per day foreach person is well below the Winsorization point.

1106 and 1108 The exact same process is applied to the outlierprobability model and mean policy period payments for people that havebase period claims. The expected value is calculated by multiplying theprobability by the mean.

An example of a Program to score the outlier with claims data (see 1017)is as follows:

 proc score data=’data from cart’ score=’logistic output ‘ out=’datawith claims’  type=PARMS predict;   var ensagen agesq exp9olsd exp9sqdsq5oth   ten5moch dxresp othdiges hi2dvby dxdigest dxcircul tnde5ls1tnde5ls3-tnde5ls5  ensxkd1a ensxkd2b ensxkd3c ensxkd4d ensxkd6f;    run; run; data ‘DATA WITH CLAIMS’; set ‘DATA WITH CLAIMS’;expprob=hilols*’mean of outliers’;

1110 and 1112 The expected next period inlier (less than or equal to theWinsorization point) payments are added to the expected next periodoutlier payments to produce the total expected payments in the nextperiod for people with no claims (from 920) and for people with claimsin the base period (from 918). The following program is an example ofscoring inlier data with claims.

Program to run scoring of inlier with claims data (output from OLSregression see 914) ***score ALL data;  PROC score data=‘DATA WITHCLAIMS’ score=‘OLS regression scores’ out=‘DATA WITH CLAIMS’ type=PARMSpredict;   var ensagen agesq exp9olsd exp9sqd sq5oth   ten5moch dxrespothdiges hi2dvby dxdigest dxcircul  td5lad2-td5lad13 ensxkd1a ensxkd2bensxkd3c ensxkd4d ensxkd6f  ; run; run; title2 ‘REPORT TO REVIEW SCOREDDATA With model universe’; PROC means data=‘DATA WITH CLAIMS’ ; var wins6850 expolsls exp5rLAD exp5rtLs ensagen agesq exp9olsd  exp9sqdsq5oth   ten5moch dxresp  othdiges hi2dvby dxdigest dxcircul td5lad2-td5lad13 ensxkd1a ensxkd2b ensxkd3c ensxkd4d ensxkd6f;  whereexp9olsd ge 1.15;

1114 This database includes everybody that was included in the modelinguniverse (i.e., the standard population). However, there are people thatwere enrolled next period but not included in the modeling universe.

1116 When everybody included in the modeling database is combined, thesum of the expected payments per day next period should equal the actualpayments. Additional model testing is performed at this point. The samemethods (see 918 and 920) that were used to test the models developed onsubsets of the modeling universe are reapplied now. This summary testingis even more important than testing the components of the completemodel.

1118 There are three categories of persons used for which insurers willbe at risk during the next period but who are excluded in the modelingdatabase (i.e., the standard population).

1. Persons enrolling during the lag period

2. Persons enrolling during the next period

3. Persons terminating during next period

-   -   a. in 1 or 2 above    -   b. other categories

For those in categories 1 or 2, no base period claims data are availablewhen the rates must be developed and offered. Consequently no modelpredictions can be made for them. However, we know their actual paymentcosts during next period. The following tabulations will show if anyadjustment in expected next period costs is needed for them.

Compare the next period actual costs per persons per day for those incategories 1 and 2 with both the expected next period cost per personper day and the actual next period cost per person per day for those inthe following categories (note that these are detailed examples ofsubscriber units that could be used for pricing also):

Subscriber only

Subscriber and spouse

Subscriber spouse and 1 dependent

Subscriber spouse and 2+ dependents

Subscriber and 1 dependent, no spouse

Subscriber and 2+ dependents, no spouse

Because outlier next period costs may distort these findings, thefollowing quantities of costs per person per day should also be comparedto reduce the effects of outlier.

Median

75th percentile

90th percentile

If there are no significant differences between the excluded andincluded categories of persons, no adjustment is needed. For thosecategories for which there are significant differences, the adjustmentfactor will be (excluded category mean next period cost/day) divided by(included category mean next period cost/day).

The number of persons in category 1 can be determined for those whoactually enrolled in the lag period while the number in category 2 canbe estimated from underwriting period data. The final adjustment factorwill be the product of the per person adjustment factor (as above) andthe proportion of all next period person days estimated to be comprisedby those in category 1. The proportion of next period person dayscomprised by those in the model will have an adjustment factor of 1.00.

The use of these adjustment factors can be further refined by applyingthem separately for sets of insured groups which have similar adjustmentfactors, instead of applying one adjustment factor to all groups.

Additional adjustment for those in category 3a above is not requiredsince these persons experience will be included in the adjustment forthose in categories 1 and 2. Those persons in category 3b will beincluded in the population used as the standard for our overall riskmodels. They can thus be scored by their base period attributes, andtheir next period expected costs can be estimated from the describedmodels. We can thus score them by their base period attributes andestimate their next period expected costs from our models. These canthen be compared to the actual next period costs per person per day, intotal and by the subscriber family categories listed above.

After checking for the influence of outliers, any subsets with actualvalues differing significantly from expected values can be the basis ofadjustment. The proportions of person days in category 3b can beestimated from the available data.

As noted above, separate adjustments can be made to expected next periodcosts for groups which have similar adjustment component factors.

1. actual to expected costs

2. proportion of next period person days attributable to those incategory 3b There may well be an interaction in these two factors.

1120. The database of all people covered next period is compiled next. Aflag is set to one if the person has an expected payment next periodthat was calculated from the risk adjustment models. Only the newjoiners in the lag period or next period cannot have an expectationcalculated from the risk adjustment model.

1122 When this product is used for an application of prospective pricingfor insurance coverage, the future cost of health care needs to beincluded. The risk adjustment models include the historical cost trendsince it was present in the data. In other words, no additionaladjustment was required for the modeling since the model uses the baseperiod to forecast next period's payments so the cost trend inherent inthe data is built into the model. Note that with a 3 month lag period,this is a 15 month cost trend. If the future annual cost trend isexpected to be identical to he cost trend between the base period andthe next period, then no further adjustment is needed since it isalready incorporated in the data and model. If the future cost trend isdifferent from the cost trend implicit in the data used for modeldevelopment, the ratio of the future cost trend divided by the modelperiod cost trend should be used as an adjustment.

All health insurance companies use an estimate of the future medicalcost trend to increase future expected claim costs to what they expectthem to be in the policy period. The simplest group-level cost forecastfor a credible group is last year's cost multiplied by cost trendproducing the “experience” forecast. The CI will provide a cost trendforecast for use in this invention. The development model has animplicit cost trend built into it since it was present in the modeldevelopment data. Therefore, the development model must be detrended andthen the CI's cost trend forecast can be applied to the person-levelcost forecast when the model is applied to the underwriting period data.In order to detrend the development model, we calculate the cost for astandardized population for the book of business in the base and nextperiods. The standardized population assumes a specific mix ofdemographics in the CI's book of business for the base and next periods.A particular embodiment would calculate the proportion of cost in eachof the following categories: male employee; female employee; malespouse; female spouse and other dependent cross-classified by 5-10 agecategories (e.g., <5, 5-17, 18-24, 25-34, 35-44, 45-54, 55-64, 65-74,75+). This particular classification would produce up to 40 demographiccells. Other classifications could be used. Too many cells will cause aloss of robustness in the estimates. The mean cost per person per cellin the next period divided by the associated mean cost in the samedemographic cell in the base period calculates the cost trend per cellduring the model development period. One method to standardize thepopulation in order to produce a single cost trend for the entire bookof business is to weight each cell by the proportion of cost it accountsfor in the base period. The weighted average of the cells' cost trend isa summary cost trend for the book of business for that standardpopulation for the time period between the base and next periods. Ifthose periods are contiguous and one year each, the annual developmentmodel cost trend has been calculated. Otherwise, an adjustment must bemade for the time periods to calculate an annual trend. If the base, lagand next period are each one year, the square root of the cost trendwill calculate the annual cost trend since the trend compounds. If thelag period is three months and the base and next period are one year,the fifth root of the cost trend is the three month cost trend. Thethree month cost trend is taken to the fourth power to calculate theannual cost trend. To apply the CI's single number cost trend (whichwill be an annual trend), the reciprocal of the annual development modelcost trend is multiplied by the CI's annual cost trend to calculate thecost trend that should be applied to the underwriting period data afterapplication of the development model. This method works for first dollarmedical insurance, aggregate only medical stop loss and reserving forthose insurance products.

The development model next period data need to be detrended and thenretrended with the CI's cost trend forecast prior to calibrating thedevelopment model for specific stop loss coverage or aggregate stop lossin combination with specific stop loss coverage. Once those adjustmentsare made, additional cost trend adjustments do not need to be madebefore applying the specific or aggregate in combination with specificstop loss models to the underwriting period data to forecast the policyperiod costs.

Alternatively, the CI may have cost trend calculated separately bygeographic locale or by provider type (e.g., drugs, physician, inpatienthospital). If the CI's cost trend is specific to each geographic locale,the same method of demographic cell adjustments can be employed aspreviously described but a separate table is calculated for eachgeographic locale. The CI's locale specific cost trend is applied to thecost trend estimated for the model development period using thestandardized population adjustments for each locale. Each locale'sdetrending and retrending is applied to the underwriting data for thatlocale to calculate the policy period cost for that locale.

If the CI's cost trend forecast is by provider type, we need to estimatethe development model trend by provider type so that the policy periodforecast will be appropriately detrended and retrended. This can be doneby cross-classifying the demographic cells by provider type costs forthe base and next periods and calculating the provider type trend foreach demographic cell separately by provider type. The provider typecost trend by demographic cell are combined by weighting by theproportion of base year cost by each by the proportion of total cost forthat demographic cell for each provider type separately. This calculatesa provider type cost trend for the base to next period for the entirebook of business. The CI's forecast cost trend by provider type ismultiplied by the reciprocal of the model development cost trend for thesame provider type. This adjusted cost trend by provider type ismultiplied by the cost forecast for each terminal node by the associatedcost by provider type in the policy period and then summed acrossprovider type by person to calculate the policy period forecast cost perperson. The associated cost in the policy period by provider type iscalculated by multiplying the proportion of cost by provider type in thenext period by terminal node by the total forecast cost for the policyperiod for that terminal node.

1124 The person-level inflation adjusted forecasts are summed by groupand actual is compared to forecast. The group-level models makeadjustments when the actual is different from forecast.

The underwriting period data are scored using the model developed on thebase and next periods. Risk factors need to be calculated for theunderwriting period data in order to apply the model. The summed scoreddata, with appropriate cost trend assumptions, produce the expectedpolicy period costs or actual expected cost for the policy period usingthe person-level models.

FIG. 12 is a detailed flowchart illustrating processing steps fordeveloping group-level models and making adjustments to the summary ofthe person-level data of steps 106 and 108 of FIGS. 1, 204, 208 and 210of FIG. 2, or 304, 306 and 310 of FIG. 3. The steps are similar to theperson-level modeling steps. First the development model is calculatedusing the base and next period data. The model is then applied to theunderwriting period data (i.e., scoring the data) to forecast the policyperiod costs. With the group-level model there is the model developmentusing the base and next period and then the risk factor coding andscoring of the underwriting period to produce the estimated policyperiod costs for pricing the policy. The processing block descriptionsfor FIG. 12 are:

1202 There are likely to be characteristics of insured groups which caninfluence the group's costs of care over and above that based on thecharacteristics of the persons in the insured groups. For this reason wedevelop a model to identify such intergroup differences and a way ofapplying the model's results to adjust each groups expected paymentsfrom the models based on individuals. First, the person-level expectedpayments are summed by group.

1204 The group-level development models have the followingcharacteristics:

-   -   Unit of observation—the “group”    -   Dependent variable—Next period residual dollars per person per        day in the group (i.e., group total next period actual payments        less Group total next period forecast payments divided by the        number of people in the group divided by 365 days)    -   Candidate predictor variables are coded and include the        following        -   Benefit attributes            -   alternative insurance plan            -   deductible            -   co payment            -   exclusions            -   dependent coverage        -   Benefit plan type: indemnity, PPO, POS, lock-in HMO        -   Payment type: fee for service or capitation        -   Demographic cells: proportion in age range by relationship            by sex        -   COB in Base period        -   Capitation payments by provider type        -   Number of subscribers        -   Average family size and proportion in each family            composition class        -   SIC code        -   Geographic locale        -   Actual mean payments in base (underwriting) period per            person per day        -   Expected mean payments in next (policy) period per person            per day        -   Percent of enrollees joining during base period or leaving            during base period

Payment carve outs for capitation—if specific types of are paid bycapitation (e.g., primary care, obgyn), then risk factors need to bedeveloped that will allow the group-level model to reduce the paymentssince the services are covered by the capitation payments. Dummy riskfactors for the presence or absence of capitated payments by providertype will need to be included when all services are not covered by feefor service payments.

1206 A least square regression tree including selected interaction termsas predictors (other data mining techniques that develop and testnumerous interactions such as neural networks, rule induction, geneticalgorithms, clustering techniques or other methods could be used insteadof regression trees) is developed on the group-level data. This secondlevel of modeling makes adjustments for information not included at theperson-level.

1208 An ordinary least squares model (other types of regressions, neuralnetworks, or other types of predictive models could be used instead ofthe OLS regression) is applied to the predictor variables that wereimportant in the model preceding this step. The candidate predictorvariables include the terminal nodes as dummy variables and the maineffects used to define the terminal nodes.

1210 The predicted values from the model in 1208 are the average perperson per day error (i.e., residual) in the estimate of next period'spayments for everybody in the group. This residual is added to eachperson's next period expected payments from the person-level models(subtracted if it is a negative value). The model is developed onhistorical data that have no need for a cost trend adjustment except tobe annualized since the cost trend is in the data. When the models willbe used for setting prices for the policy period, the inflation adjustedperson-level next period payment estimates are used as input and thegroups are scored using the group-level models. Risk factors are codedfor the group using the underwriting period data and the groups arescored with the group-level model to produce the policy period expectedgroup-level costs.

Alternatively, the MAP4HIP method can be used to forecast person-levelcost for individual (or family) renewal health insurance. The samemethods apply but there is no “group” other than the family. The costfor the individual family members are summed to produce the family-levelforecast. A family-level model can be used for final cost adjustments.The family-level risk factors are family composition, benefit plan,geographic locale and other factors germane to the family rather than anemployment “group”.

FIG. 13 is a detailed flowchart of an embodiment of a price optimizationprocedure which may be used to carry out steps 110, 212, or 308 of FIGS.1-3. The processing block procedures of FIG. 13 are:

1302—The group cost estimate is the final output from the costestimation system (i.e., expected medical costs in the policy period).It is at the group-level and includes the inflation trend estimate.

1304—The CI provides three sets of inputs that are used in the priceoptimization. The first set of input is their expected probability ofretaining the group if the group's price is increased a specifiedamount. Rate increases will not be negative, generally, unless there ismedical price deflation. Many probability estimates are gathered withsmall changes in the price increase around the client's target profitand fewer more sparse estimates further from the targeted profit margin.The client needs to consider the group's historical costs, inflation,local competitive pricing, and other factors that influence the group'slikelihood of accepting the various price increases. Another necessaryinput from the client is the administrative costs allocable to thatgroup. This cost may be expressed as a percentage of the expectedmedical costs or in dollars per year. The final input required is aminimum expected profit or profit margin that is acceptable.

The following Table 3 is an example of price forecasting usingprobability of retention and other related input data for steps 1304,1306, 1308 and 1310:

TABLE 3 Price Forecast Example Probability Ratio Next Next Price ofAdmin Year Year Expected Increase retention to Cost Price Total CostProfit 0.00 0.95 0.25 1500 1375 118.75 0.02 0.92 0.25 1530 1375 142.600.04 0.90 0.25 1560 1375 166.50 0.06 0.85 0.25 1590 1375 182.75 0.080.80 0.25 1620 1375 196.00 0.10 0.73 0.25 1650 1375 200.75 0.12 0.680.25 1680 1375 207.40 0.14 0.63 0.25 1710 1375 211.05 0.16 0.58 0.251740 1375 211.70 0.18 0.53 0.25 1770 1375 209.35 0.20 0.45 0.25 18001375 191.25 0.25 0.35 0.25 1875 1375 175.00 0.30 0.25 0.25 1950 1375143.75 0.35 0.15 0.25 2025 1375 97.50 0.40 0.05 0.25 2100 1375 36.250.45 0.01 0.25 2175 1375 8.00 0.50 0.00 0.25 2250 1375 0.00

The optimal price is $1740 per person or a 16% increase. Costs areexpected to be $1375/person and there is a 58% chance of retaining thegroup. This yields $211.70 expected profit per person.

1306—The expected profit (or profit margin) is calculated by thefollowing formula: expected profit=(probability of accepting priceoffered)×[((1+proportion price increase)×(price in previousperiod))−(expected policy year medical costs)−(administrative costs)].

This is the expected profit (margin is calculated by dividing by thegroup's price) and it is calculated for each rate increase andprobability of retention or acceptance. The maximum expected profit isthe largest amount (or the closest to zero if they are negative)calculated in the preceding step. The largest expected profit iscompared to the client's minimum acceptable expected profit.

1308—If the expected profit is below the minimally acceptable, then theexpected profit calculations are printed out and the underwriter may runadditional analyses to test the sensitivity of the assumptions. Also,the price at which the expected profit equals the minimally acceptableprofit is printed out. If the underwriter wants to modify theprobabilities in the retention curve, those are changed and 1304 isrepeated.

1310 If the maximum expected profit is greater than the minimumacceptable profit, then the price optimizing profit, its percentageincrease, expected costs and profits are printed out for the underwriteralong with the same output for non-optimal prices. The underwriter wouldoffer the price that maximizes their profits.

Another consideration when pricing the product is the variability of theforecast cost for the policy year. Greater variability should carry anadditional risk premium. Therefore, the standard error of the group'sexpected medical cost is calculated and printed also. SAS or S Plusregressions will calculate the variability of the mean or the standarderror of the estimate of the policy year cost by combining the standarderrors of the person-level forecasts. The price that provides a 90% (orsome other high probability) chance of break-even is calculated usingthe standard error and printed. An underwriter can use the break-evenwith a high probability price and the relative standard error innegotiating price. If there is a large relative standard (e.g., standarderror of group/average standard error), the underwriter would be lessinclined to discount the price in a competitive market since thelikelihood of a loss is increased. Code for a program to run a pricingexample is found in Appendix F.

1312—If the underwriter does not want to modify the retention curve, theunderwriter offers the group the price that produces the minimallyacceptable profit for the client even if the group is expected to rejectthe offer.

1314 The final step in pricing involves translating the average priceper person per day into a monthly price per subscriber unit (e.g.,single person, enrollee with spouse, enrollee with two or moreadditional dependents—other subscriber unit constellations are alsopossible). Costs are traditionally presented in cost per member permonth or pmpm. However, subscriber units are used for pricing and it isimportant that costs are rationally allocated to the subscriber units.The price is multiplied by 365/12 to calculate the monthly price (orrescaled for another time period). One alternative for pricing thesubscriber units is to calculate the mean cost forecast per subscriberunit for the group and then inflate each mean subscriber cost by theaverage profit margin for the group (i.e., recommended optimalprice/expected cost). The mean cost forecast per subscriber unit iscalculated by summing the forecast cost per person for each person thatis a member of that type of subscriber unit in the underwriting periodand then dividing that sum by the number of subscribers of that type(not people) in the underwriting period. This gives the group's meandaily cost per subscriber for each different type of subscriber unit.Another pricing alternative is to set the price for the subscriber unitsthat are considered to be very price sensitive just below the marketprice. The remaining subscriber units must then be priced so that theoverall expected profit is maintained. That can be calculated byestimating the expected profit for the market priced subscriber unitsand subtracting it from the total expected profit for the group. Theother subscriber units must account for the remaining profitrequirement. Their price can be set so that the profit margin equals theremaining profit requirement by solving the following equation for priceper subscriber unit: (total expected profit-market priced subscriberprofit)=remaining profit=(number remaining subscriberunits)×((price/remaining subscriber unit)−(mean expected cost/remainingsubscriber unit)). Solving the equation provides an averageprice/remaining subscriber unit or (price/remaining subscriberunit)=((remaining profit)/(number remaining subscriber units))+(meanexpected cost/remaining subscriber unit). If there are two or moreremaining subscriber units, the price can be pro rated based on theaverage forecast cost/remaining subscriber unit. This approach can beused for pricing stop loss medical insurance also. Alternativeallocation of profits to subscriber groups are possible. Those ofordinary skill will appreciate that the relation of expected cost to theterms of the medical insurance will vary among insurance types. Forexample, first dollar products will have a higher expected costs thanstop-loss products.

Estimating costs that need to be considered for reserves for firstdollar health insurance and for stop loss coverage are alternative usesfor the cost forecasting process. Rather than predicting payments thatwill occur over the entire policy period, reserving requires predictingcosts that will occur in the upcoming financial reporting period (e.g.,fiscal year or quarter). The same cost forecasting process using datacollection and validation, risk factors, data mining and statisticaltechniques at the person and group-levels, testing and reporting can beapplied to produce cost estimates to be used in setting reserves. Thedependent variable needs to be changed so that the reserving model iscalibrated to the appropriate time period.

The model for reserving forecast's costs that have been incurred but notreported (IBNR) and this may include some costs of claims that have notoccurred yet but are in the financial reporting period. Typically, thereserving period will run through the end of the current fiscal quarteror year. Inflation needs to be accounted for but the time period is farshorter than for the renewal cost forecast product, but the sametechniques apply over the shortened time period.

A development period model is calibrated using the risk factors from theclaims and enrollment data in a base period to forecast total incurredclaims for the financial reporting period. The underwriting period forreserving can be the previous 12 months of claims (if available)preceding the reserving date or some other time period such as thispolicy period to the reserving date. The base period for thedevelopmental model must have approximately the same number of days asthe underwriting period so the forecast will not be biased. The policyperiod for IBNR claims begins at the first date of the financialreporting period and ends at the last day of the reporting period. Thenext period for the model development cost for IBNR or claims that havenot occurred yet must be of the same length as the actual reservingperiod during the policy period for correct model calibration. This is astandard person-level model for MAP4HIP with a shorter next period(e.g., quarter) possibly. The total forecast claims are summed toprovide a total claim amount forecast. This is used as an independentvariable and is supplemented by additional independent variables thatinclude the reported claims, historical completion rates by time intothe reserving period, claims backlogs and seasonality. The total of theIBNR claims from the reserving period is the dependent variable. Notethat this model is at the book of business level. A quarter will yieldonly one data point for the book of business. If there are too fewquarters for developing a stable model, an alternative approach isrecommended.

The alternative approach defines reserves as the difference between thetotal claim forecast for the reserving period and the incurred andreported claims during that period. In other words, the sum of theincurred and reported claims is subtracted from the total forecastclaims and this equals the reserve forecast.

The reserving product can be delivered as a service bureau product or assoftware, either stand alone or an ISP model, using the same data flowsas used with the cost forecasting models for fully insured or stop losscoverage. The pricing module is not relevant for reserving.

The fully insured medical product uses claims information as a criticalcomponent of the cost forecasting model. Claims are available if thegroup is renewing first dollar health insurance but not for a new group.Enrollment data may be available for new groups (possibly only foremployees) or individual health insurance. The same process can beapplied to new groups or individual (or family but called by conventionindividual) policies by using the method for the people with no claimsand only enrollment data. The base period enrollment data must containthe same potential risk factors as are available for the new groups.Note that there is only one model since there are no claims data sopeople cannot be separated into claims and no claims people in the baseor underwriting periods. The cost forecasting model should be developedon the client's current book of business. The dependent variable is nextperiod's payments. The independent variables are the same as the riskfactors used in the no claims model (i.e., detailed enrollment dataonly). The modeling universe includes everybody rather than only thosewith no claims. Sometimes claims data are available for high cost casesin the new group and also may include the demographics and diagnosesassociated with those high cost cases. This information can be includedas person-level risk factors but the same information will need to beincluded as potential person-level risk factors in the base period forthe development model. A group-level model can be applied to thesummarized group-level data as with renewal business. Frequently thetotal cost for the new group last year is available and may be used as arisk factor for the group-level model. The total group cost would thenneed to be included in the base period as a potential risk factor also.

The fully insured new business cost forecasting and pricing product canbe delivered as a service bureau product or as software, either standalone or an ISP model, using the same data flows as used with the costforecasting models for fully insured or stop loss coverage.

Aggregate only medical stop loss insurance, such as CapCost, can havedifferent data sources than fully insured insurance (where the data isheld and owned by the insurance company), as a TPA pays the claims andholds the data for the self-insured employer. It is our intent to getthe data for all of the TPA's groups so that our client, the stop lossinsurer, can bid on all of the groups serviced by the TPA. Therefore,any renewal business for the TPA can use the full cost forecastingmodels. New business for the TPA will not have claims data available.The enrollment data only new business model cost forecasting techniqueis applicable for new business for the TPA. The enrollment data areneeded for the new group. Future refinements will include combining thehistorical payments, summarized by month or quarter, with the enrollmentinformation since person-level claims will not be available.

In order to understand the performance of CapCost versus the traditionalspecific plus aggregate stop loss insurance, we had to create syntheticgroups since our database only contained 116 actual groups of verydifferent sizes. Monte Carlo random samples were developed for syntheticgroups of 50, 100, 250, 500, 750, 1000, and 1500 employees plus theirdependents. A group of 50 employees is smaller than the smallestemployer in the target market and 1500 employees is toward the upper endof the target market for stop loss health insurance. Five hundred randomgroups were selected with replacement. All family members of theemployees were included in the group. The claims payments werecalculated for traditional $50,000 specific with 125% aggregateexclusive of specific and for CapCost 110™. CapCost 110™ is aggregateonly at 110% of the attachment point. TruRisk models were applied toforecast next years claim payments. CapCost 110™ medical claims paymentsfor groups of 50 employees is about 80% of the claims paid out fortraditional $50,000 specific plus $125% aggregate stop loss. Once thereare 250 or more employees the CapCost 110™ claims pay out is less than50% of the traditional stop loss coverage. Similar results were seen for$25,000 specific and $75,000 specific both plus 125% aggregate coverage.The pay out for CapCost 110™ is much lower for $25,000 specific plus125% aggregate and closer to the $75,000 specific plus 125% aggregate.The mean and standard deviation are presented in TABLE 6 for threedifferent size groups. 125% aggregate is included with each of thespecific coverage. The mean claims paid out are less with CapCost 110™and the standard deviation is smaller than with traditional stop losscoverage. The main factor causing this is the far lower frequency ofclaims with CapCost 110™ (18-26% of groups) as compared to traditionalspecific plus aggregate coverage (87-98% of groups). When a claim wasmade with CapCost 110™ coverage, it was greater and the standarddeviation was also greater than for claims with traditional stop losscoverage.

The claims paid out for CapCost 110™ and traditional stop loss arehighly correlated:

R=0.95 for 250 employees with $25,000 specific and 125% aggregateR=0.91 for 500 employees with $50,000 specific and 125% aggregateR=0.87 for 750 employees with $75,000 specific and 125% aggregateThe risks or claims paid out are correlated but lower for CapCost 110™since the claim frequency is far lower with that coverage.

An aggregate only policy can be underwritten using the group-levelexperience for credible groups. However, it is very important toaccurately estimate the group's costs for next year since thatdetermines the 110% attachment point. Therefore, the MAP4HIP costforecasting method is recommended as the preferred embodiment since thepredicted mean cost is more accurate than the predicted mean costderived using the standard approach with group-level experience aspredictor. The same steps are taken in developing the models for CapCostas are used with the general MAP4HIP process. The only difference is thevariety of TPAs as multiple data sources versus one CI with fullyinsured medical. Person-level and group-level models are developed forcost per person per day. The risk factors, statistical methods anddependent variables are the same. The attachment point needs to be setto the appropriate amount (e.g., a 110% attachment point is calculatedby multiplying the cost trend adjusted forecast cost by 1.1).

The aggregate only cost forecasting product can be delivered as aservice bureau product or as software, either stand alone or an ISPmodel, using the same data flows as used with the cost forecastingmodels for fully insured coverage.

TABLE 6 250 employees 500 employees 750 employees CapCost $25,000 specCapCost $50,000 spec CapCost $75,000 spec total 500 groups $/employee229 582 104 278 87 212 std. dev. 681 789 301 385 229 303 group claims >0 % groups > 0 26.40% 98.20% 18.20% 89.20% 20.80% 87.00% # groups > 0132 491 91 446 104 435 $/employee 867 592 569 311 419 243 std. dev. 1099791 483 394 336 312 minimum 2.08 0.3 8.04 6.34 5.32 2.63 maximum 64797066 1823 1921 2027 2026

The MAP4HIP method can be used for cost forecasting for specific stoploss coverage. Specific stop loss pays for claims above a specifiedthreshold (i.e., the deductible). Those claims costs can be forecastusing the same techniques that MAP4HIP uses for forecasting outlieramounts. First, the forecast inflation or cost trend adjustment for thepolicy period must be applied to the model development data. This is adifferent order of steps from the standard MAP4HIP sequence but it isnecessary due to the specific deductible. For example, if there is a$50,000 deductible and a 10% cost trend then a $50,000 claim in the nextperiod would yield a $0 specific claim. If that claim occurred in thepolicy period after 10% inflation it would produce a $5,000 specificclaim ($50,000×1.1=$55,000 subtracting the $50,000 deductible yields a$5,000 specific claim). Inflation during the lag period must be addedalso and inflation built into the development model must be divided outto provide accurate future cost estimates for modeling specific claims.After the inflation adjustment for the next period data, costs are thenrecalculated so that they are zero if the person's claims are below thedeductible in the next year (similar to Winzorization). If costs totalabove the deductible, then the specific cost is set to that amount.Probability models are developed for claims and no claims people in thebase period. The probabilities are weighted by the average cost in theterminal node (above the deductible) to produce the expected cost. Theperson-level forecasts are summed to make the group-level forecast.Group-level models with the same risk factors as MAP4HIP are developedusing the residual of the actual specific payments per person per dayminus the forecast specific costs. After development period models arecomplete, they can be applied to data from an underwriting period todevelop cost forecasts for a policy period.

Aggregate stop loss is frequently added to specific coverage. Theaggregate coverage with specific coverage is paid exclusive of specificclaims and specific claims are not used in defining the attachmentpoint. Therefore, aggregate stop loss (with specific coverage also)claim amount can be modeled using the inlier methods in the MAP4HIPmethod. The Winsorization point is the specific deductible. As withspecific, the cost trend forecast for the policy period must be appliedto the next period data prior to the inlier calculations. Only inliersare modeled since the specific costs will be borne by the specificcoverage. Both the specific and aggregate with specific should bemodeled and priced separately. Note that this is different fromaggregate only stop loss coverage since all costs contribute to theattachment point and aggregate claim amount for aggregate only stop losscoverage.

The specific cost forecasting and specific plus aggregate costforecasting products can be delivered as a service bureau product or assoftware, either stand alone or an ISP model, using the same data flowsas used with the cost forecasting models for fully insured coverage.

Group short term disability insurance (STD) is insurance that pays aportion of an employees wages (typically 50-100%), a flat amount or thelesser of the portion or the flat amount when an employee is disableddue to a non-work related accident, sickness or pregnancy. The durationof the salary replacement is typically 13, 26 or 52 weeks. The MAP4HIPmethod can be applied to forecast STD payments with a few modifications.The potential risk factors are the same as the risk factors used withmedical insurance and described in section 806 with the additional riskfactors of number of STD days and payments in the base and underwritingperiods and job classification when these data are available. Otherwise,the exact same potential risk factors as used with MAP4HIP can be linkedto the STD days next year and modeled using the MAP4HIP modelingtechniques and processes. The dependent variable in the modeldevelopment database is the number of STD days in the next period. Inother words, the medical claims and STD days in the base period arelinked in the database to STD days in the next period for the sameperson and a STD day forecasting model for the next period is developed.The interaction capturing techniques and other modeling methods are thesame as for medical claims but it is unlikely that the data need to beWinsorized and outliers modeled separately since STD is capped at ashort period. The development model is applied to score the actualunderwriting period data to calculate the expected number of STD daysduring the policy period to calculate the forecast claim amount. Theexpected number of STD days needs to be weighted by the expected costper STD day. This can be calculated by averaging the STD cost per day inthe underwriting period and increasing it by wage inflation andmultiplying it by the expected number of STD days. Alternatively andpreferably, each person's salary or flat rate benefit is linked to thedatabase and the forecast STD days are multiplied by the STD per daybenefit amount (i.e., portion of salary covered by STD) and increased bythe salary inflation history. The STD cost per person is summed toproduce the group's expected cost. Confidence bounds can be calculatedfor the number of expected STD days to provide a range of high to lowcost for the group. A group-level model is built using the same groupcharacteristics as with MAP4HIP and possibly supplemented withcharacteristics of the benefit plan. The group-level dependent variableis residual STD days per person weighted by the mean cost per person perday to calculate the forecast claim amount.

The STD cost forecasting product can be delivered as a service bureauproduct or as software, either stand alone or an ISP model, using thesame data flows (with STD days and salary information added) as usedwith the cost forecasting models for fully insured coverage.

Long term disability insurance (LTD) is wage replacement insurance fordisabilities that run longer than STD coverage and may continue untilthe insured is 65 years old. Group LTD coverage is for a policy periodthat is typically one year. The insurer does not bear the cost ofcontinuing disability liability from previous periods unless it was theinsurer for that period also. The insurer will bear the cost for newlong term disabilities that occur during the policy period and willcontinue to be responsible for that cost until the coverage expires(e.g., the beneficiary dies or turns 65 years old) or the beneficiarycan go back to work. The probability of a LTD claim occurring during thepolicy period (i.e., the dependent measure) can be modeled and forecastusing linked medical and LTD claims at the person-level. The base periodrisk factors are the same as the STD model, including medical claims,and STD claims with the addition of LTD claims linked, recoded and usedas supplemental risk factors when available. The forecasting model canbe built using only medical claims and enrollment information. Logisticregression, regression tree or hybrid tree with terminal nodes feedinginto a logistic regression (the hybrid tree being the preferredembodiment) are the statistical techniques for modeling the incidencerate of LTD claims during the next period (typically one year). Otherinteraction capturing techniques can be used to predict the incidencerate but must be appropriate for modeling a variable that is bounded by0 and 1. The development model is applied to underwriting period data tocalculate the expected probability of a LTD claim during the policyperiod. The probabilities need to be weighted by the expected netpresent value of the disability to estimate the total cost of thedisability (i.e., the claim amount). The net present value of thedisability cost is obtained from actuarial tables. The expected costsare summed across the group members to produce the expected group cost.The net present value needs to be derived from other databases andshould be conditionalized on the cause of the disability since the costwill vary depending on the cause. The cause of the disability can beestimated by the clinical conditions defining the terminal node of theperson. A more accurate total cost of the disability will be calculatedif the weights are conditionalized on the cause of the disability.

If a good estimate of the net present value of the future cost or lengthof the disability is not available for the various terminal nodes, thenan index can be calculated. This index is the expected number of newdisabilities for the group during the policy period divided by the“average” number of disabilities calculated using standard actuarialtechniques for new business for LTD. A confidence interval can becalculated for the expected number of disabilities using the expectedprobability of disability per person and computing the upper and lowerbounds for the group by using a Lexian distribution that calculates theexact probabilities. A binomial distribution can be used but theconfidence interval will not be exact since it assumes that everybodyhas the same average probability within the group. Group's that have aconfidence interval that does not cover the “average” calculated fromstandard actuarial techniques are significantly higher or lower in riskand should be priced differently than the average group. Alternativelyand preferably, the group's standard deviation from the mean expectednumber of LTD cases can be calculated using on of the distributionsabove. The number of standard deviations from the mean is a scale thatcan be used for pricing. The end points of the scale can be anchored bymarket prices for the lowest and highest risk market prices or by actualhistorical LTD experience, conditionalized on group size.

The LTD cost forecasting product can be delivered as a service bureauproduct or as software, either stand alone or an ISP model, using thesame data flows (with the addition of STD and LTD claims and salaryinformation) as used with the cost forecasting models for fully insuredcoverage.

Group term life insurance is very similar to group disability, it is fora policy period (usually one year) and the coverage and rates aretypically not guaranteed beyond that period. Unlike LTD, the deathbenefit is a one-time payment for a known amount (the amount is usuallya multiple of salary up to a limit) so there is no uncertainty over thesize of the benefit. Therefore, knowing the expected number of deaths(weighted by the amount of the life insurance) will provide an accurateestimate of the cost of that group. Alternatively, a relative risk indexcan be calculated in the same manner as with LTD. The numerator is theexpected number of deaths (possibly weighted by the death benefit) andthe denominator is the “average” number of deaths (possibly weighted bythe death benefit) where the average is calculated using the age by sexdistribution and standard life tables calculated by actuaries. Thesignificance of the index can be calculated using the Lexian(preferably) or binomial distributions for the person-levelprobabilities and testing if the average is covered by the confidencebounds for the group. Groups with expected numbers of deaths outside theaverage should have higher or lower rates than average. Groups withlarge confidence intervals should be charged more than groups with smallconfidence intervals, all other factors being equal.

The same approach for developing the person-level probability models isused for life insurance as is used for LTD. Medical claims from a baseperiod are linked with deaths occurring in the next period for a verylarge block of business. The risk factors are the same as or developedusing a similar technique as used with the medical cost forecastingmodels. The dependent variable is the probability of death. The sameinteraction capturing techniques used for the LTD probability model areused for the life insurance model (i.e., the preferred embodiment is thehybrid probability tree). The developmental model is applied to medicalclaims during an underwriting period and death forecasts are calculatedfor the policy period. The probability of death is weighted by the deathbenefit to calculate the forecast claim amount per person. The claimamounts are summed across people in the group. A group-level model canbe developed that uses the sum of the probabilities (i.e., the number ofexpected deaths), actual number of deaths in the base period and thenumber and amount of STD and LTD claims to supplement the risk factorsused in a standard MAP4HIP group-level model, when available. Otherwise,the same medical claims and enrollment information used with MAP4HIPwill suffice. The dependent measure is the forecast number of deaths andis weighted by the expected death benefit per person to calculate theforecast claim amount.

The group term life insurance death rate and claim amount forecastingproducts can be delivered as a service bureau product or as software,either stand alone or an ISP model, using the same data flows(preferably supplemented with the addition of death and salaryinformation) as used with the cost forecasting models for fully insuredmedical coverage.

While the present invention has been described with respect to specificembodiments, it will be appreciated that various alternatives andmodifications will be apparent based on the present disclosure, and areintended to be within the spirit and scope of the following claims.

APPENDIX G Data Elements & Descriptions For Software Of CD-ROM AppendixField Names Descriptions Legal Values abdpain Abdominal pain or dxvar =‘7890’ 1, 0 abheart Abnormal heart sounds or ‘7850’ <= dxvar <= ‘7853’1, 0 acne Acne or ‘706’ <= dxvar <= ‘7061’ 1, 0 actinseb Actinic andseborrheic keratosis or ‘702’ <= dxvar <= ‘70219’ 1, 0 acubronc Acutebronchitis and brochiolitis-dx = 466 1, 0 acuphary Acutepharyngitis-dxvar = ‘462’ 1, 0 acusinu Acute sinusitis-dxvar = :‘461’ 1,0 acutonsl Acute tonsillitis-dxvar = ‘463’ 1, 0 add Attention deficitdisorder-dxvar = : ‘3140’ 1, 0 agebrk35 age 35+ 1 (35+), 0 (35 under)agegp 0-0.9 then agegp = ‘a’; 1-4.9 then agegp = ‘b’; agegroups values =a-k 5.0-17.9 then agegp = ‘c’; 18-24.9 then agegp = ‘d’; 25-34.9 thenagegp = ‘e’; 35-44.9 then agegp = ‘f’; 45-54.9 then agegp = ‘g’; 55-64.9then agegp = ‘h’; 65-74.9 then agegp = ‘i’; 75-84.9 then agegp = ‘j’; ge85 then agegp = ‘k’; agesq Age Squared ahypothy Acquiredhypothyroidism-dxvar =: ‘244’ 1, 0 aidstest AIDS-cpt testing codes ifcpts{i} in 0, 1, 2 . . . number of (‘86687’, ‘86701’, ‘86702’, ‘86703’,‘86688’, ‘86689’) then tests aidstest = sum (aidstest, 1); alcohdepAlcohol dependence syndrome-dxvar = :‘303’ 1, 0 alerhin Allergicrhinitis or dxvar = :‘477’ 1, 0 amt generic-test purposes 1, 0 anemiaAnemia-‘280’ <= dxvar <= ‘2859’ 1, 0 anginap Angina pectoris or dxvar =:‘413’ 1, 0 antitemp temporary to assist in coding prenatal cpts{i} in(‘59425’, ‘59426’) 1, 0 anxiety Anxiety states-dxvar = :‘3000’ 1, 0artery Dis of the arteries, arterioles, and capillaries-‘440’ <= dxvar<= ‘4489’ 1, 0 arthero Coronary atherosclerosis or dxvar = :‘4140’ 1, 0artipost Artificial opening status and oth postsurgical states or ‘V44’<= dxvar <= 1, 0 ‘V4589’ assault Assault or ‘E960’ <= dxvar <= ‘E969’ 1,0 asthma Asthma-dxvar = : ‘493’ 1, 0 attsurgd Attention to surgicaldressing and sutures or dxvar = ‘V583’ 1, 0 bargain BARGAIN STATUS- ? H,S basecat a thru v-see basecata-basecatv a . . . v basecata .0001 <=chgd <= .33999 1, 0 basecatb .34 <= chgd <= .48999 1, 0 basecatc .49 <=chgd <= .70999 1, 0 basecatd .71 <= chgd <= 1.03999 1, 0 basecate 1.04<= chgd <= 1.4999 1, 0 basecatf 1.5 <= chgd <= 1.99999 1, 0 basecatg 2<= chgd <= 2.59999 1, 0 basecath 2.6 <= chgd <= 3.44999 1, 0 basecati3.45 <= chgd <= 4.54999 1, 0 basecatj 4.55 <= chgd <= 5.9999 1, 0basecatk 6 <= chgd <= 7.89999 1, 0 basecatl 7.9 <= chgd <= 10.44999 1, 0basecatm 10.45 <= chgd <= 13.7999 1, 0 basecatn 13.8 <= chgd <= 18.199991, 0 basecato 18.2 <= chgd <= 23.99999 1, 0 basecatp 24 <= chgd <=35.99999 1, 0 basecatq 36 <= chgd <= 53.99999 1, 0 basecatr 54 <= chgd<= 80.99999 1, 0 basecats 81 <= chgd <= 121.49999 1, 0 basecatt 121.5000<= chgd <= 181.99999 1, 0 basecatu 182 <= chgd <= 272.99999 1, 0basecatv chgd ge 273 1, 0 baseclmn # of claims in base pd baseclmspresence of base claims - yes/no 1, 0 basemos # OF MONTHS in base period1-12 baseyr Year associated with base 2 digit yr - 95, 96, 97, 98 bdatebirth date VALID DATE benfitcd BENEFIT CODE 3 = medical claim bn_oth othbenign neoplasm- (‘210’ <= dxvar <= ‘2159’) or (‘217’ <= dxvar <=‘2299’) 1, 0 bn_skin Benign neoplasm of skin-dxvar=:‘216’ 1, 0 bychggp.0001 <= chgd <= .33999 bychggp = 1; .34 <= chgd <= .48999 bychggp = 2;1 thru 22 .49 <= chgd <= .70999 bychggp = 3; .71 <= chgd <= 1.03999bychggp = 4; 1.04 <= chgd <= 1.4999 bychggp = 5; 1.5 <= chgd <= 1.99999bychggp = 6; 2 <= chgd <= 2.59999 bychggp = 7; 2.6 <= chgd <= 3.44999bychggp = 8; 3.45 <= chgd <= 4.54999 bychggp = 9; 4.55 <= chgd <= 5.9999bychggp = 10; 6 <= chgd <= 7.89999 bychggp = 11; 7.9 <= chgd <= 10.44999bychggp = 12; 10.45 <= chgd <= 13.7999 bychggp = 13; 13.8 <= chgd <=18.819999 bychggp = 14; 18.2 <= chgd <= 23.99999 bychggp = 15; 24 <=chgd <= 36.99999 bychggp = 16; 36 <= chgd <= 53.99999 bychggp = 17; if54 <= chgd <= 80.99999 bychggp = 18; if 81 <= chgd <= 121.49999 bychggp= 19; 121.5000 <= chgd <= 181.99999bychggp = 20; 182 <= chgd <=272.99999 bychggp = 21; chgd ge 273 bychggp = 22; calckidy Calculus ofkidney and ureter 1, 0 cancldte enrl cancel date candidiaCandidiasis-dxvar =: ‘112’ 1, 0 carddysr Cardiac dysrhythmias-dxvar = :‘427’ 1, 0 carpltun Carpal tunnel syndrome-dxvar = ‘3540’ 1, 0 CAT4BASEsee cat4base1 - 4-code as 1 thru 4 1, 2, 3, 4 cat4base1 group 1 of 4set-base yr claim .0001 <= chgd <= 1.49999 1, 0 cat4base2 1.5 <= chgd <=5.99999 1, 0 cat4base3 6.0 <= chgd <= 23.999999 1, 0 cat4base4 chgd ge24 1, 0 cataract Cataract-dxvar = : ‘366’ 1, 0 CATBMOS groupings ofmonths in base year (with or without chgs) for an ordered A thru Fcandidate predictor variable WITH BASE CLAIMS 1 <= basemos <= 3 >catbmos = ‘A: 1-3’; basemos = (4, 5) > catbmos = ‘B: 4-5’; basemos = (6,7) > catbmos = ‘C: 6-7’; basemos = (8, 9) > catbmos = ‘D: 8-9’; basemos= (10, 11) > catbmos = ‘E: 10-11’; basemos = (12) then catbmos = ‘F:12’; WITHOUT BASE CLAIMS basemos = 1 > catbmos = ‘A: 1’; basemos = (2,3, 4, 5) > catbmos = ‘B: 2-5’; basemos = (6, 7, 8) > catbmos = ‘C: 6-8’;basemos = (9) > catbmos = ‘D: 9’; basemos = (10, 11) > catbmos = ‘E:10-11’; basemos = (12) > catbmos = ‘F: 12’; celluabs Cellulitis andabscess 1, 0 cerebrov Cerebrovascular disease-‘430’ <= dxvar <= ‘4389’1, 0 charge CHARGE CHEMO chegroup codeapy combines (lspxchem, mcptchem)1, 0 chestpn Chest pain or dxvar = :‘7865’ 1, 0 chf Congestive heartfailure-dxvar = ‘4280’ 1, 0 chg Charge in Base Year chgage Base yr chg *Age CHGC2NZ base pd 2n chg category chgcata Base charge category A 1, 0chgd Base Pd Chg per Enrolled Day chgdiff Pred Pd Chg-Base Pd Chg chgllog10 base pd charge chgp Charge in Prediction Year CHGPC2NZ Nxy yr pd2n chg category chgpcata Next Year charge category A 1, 0 chgpd Pred PdChg per Enrolled Day chgpl log10 pred pd charge chgps Spec Pred Pd chgchgpw Pred Pd Chg Winsor $400k, if chgp > 400000 then chgpw = 400000 +(.5 * (chgp − 400000)); chgpwc Pred Charges w/ Claims chgpwd Pred Pd Chgper Enrolled Day Winsor $400k chgpwl log10 pred pd winsorized chargechgs if chg >50000 then chgs = chg − 50000; else chgs = 0; chgsq Base yrchg squared chgt1 Charge in Base Year Trimester 1 chgt2 Charge in BaseYear Trimester 2 chgt3 Charge in Base Year Trimester 3 chgw Base Pd ChgWinsor $400k chgwc Base Charges w/ Claims chgwd96 Next Pd Chg perEnrolled Day Winsor $96pd DOLLAR SPECIFIC TO SOURCE chgwl log10 base pdwinsorized charge chlamyd Unspecified viral and chlamydialinfections-dxvar in (‘0799’, ‘07998’, ‘07988’ 1, 0 chronbro Chronic andunspecified bronchiolitis or ‘490’ <= dxvar <= ‘491’ 1, 0 chrsinuChronic sinusitis or dxvar = : ‘473’ 1, 0 ckasrcl Check Age/Sex/Relationcell, occurs when all sex/relationship variables 1, 0 f/m###en, kd or clare exhausted claimcde CLAIM CODE REQUIRE SOURCE INPUT claimno CLAIM NBRREQUIRE SOURCE INPUT clmim1 # Claims October 1995 clmim10 # Claims July1996 clmim16 # Claims January 1997 clmim22 # Claims July 1997 clmim28 #Claims January 1998 clmim34 # Claims July 1998 clmim39 # Claims December1998 clmim4 # Claims January 1996 cmpms Comps of surg and med care, notelsewhere classified-‘996’ <= dxvar <= 1, 0 ‘9999’ cobgnpd Month #Company Enrollment Starts 1-39 coendpd Month # Company Enrollment Ends1-39 coins COINSURANCE amount commdsr Potential health hazards relatedto communicable Dis 1, 0 compcde enrl company REQUIRE SOURCE INPUTcompcode COMPANY CODE REQUIRE SOURCE INPUT complic combines (cmpms,dxcomp, gadvmed) 1, 0 compname COMPANY NAME REQUIRE SOURCE INPUT condermContact dermatitis and oth eczema 1, 0 conjunct Conjunctivitis-‘3720’ <=dxvar <= ‘3729’ 1, 0 constip Constipation or dxvar = ‘5640’ 1, 0 contusContusions with intact skin surfaces or ‘920’ <= dxvar <= ‘9249’ 1, 0convuls Convulsions or dxvar = : ‘7803’ 1, 0 corncal Corns, callosities,and oth hypertrophic and atrophic skin or ‘700’ <= dxvar <= 1, 0 ‘7019’cough Cough or dxvar = ‘7862’ 1, 0 cpt CPT CODE cutobjs Cutting orpiercing instruments or objects or dxvar = ‘E920’ 1, 0 cycle Pedalcycle, nontraffic and oth or dxvar IN 1, 0 (‘E8003’, ‘E8013’, ‘E8023’,‘E8043’, ‘E8053’, ‘E8063’, ‘E8073’, ‘E8206’, ‘E8216’, ‘E8226’, ‘E8236’,‘E8246’, ‘E8256’, ‘E8261’, ‘E8269’) cystbldd Cystitis and oth dsrs ofthe bladder or ‘595’ <= dxvar <= ‘5969’ 1, 0 cysturin combines(cystbldd,othurin) 1, 0 datechk CHECK DATE REQUIRE SOURCE INPUT datefrom FROM DATEREQUIRE SOURCE INPUT dateproc PROCESS DATE REQUIRE SOURCE INPUT daterptREPORTED DATE REQUIRE SOURCE INPUT datethru THRU DATE REQUIRE SOURCEINPUT deduct DEDUCTIBLE REQUIRE SOURCE INPUT deltemp cpts{i} in(‘59100’, ‘59830’, ‘59430’) or ‘59120’ <= cpts{i} <= ‘59160’ or ‘59812’<= 1, 0 cpts{i} <= ‘59821’ or ‘59840’ <= cpts{i} <= ‘59857’ or ‘59400’<= cpts{i} <= ‘59414’ or ‘59510’ <= cpts{i} <= ‘59525’ or dxs startingwith (‘V22’, ‘V23’) depnbr DEP NBR 01 = enrollee depress Majordepressive disorder- (‘2962’ <= dxvar <= ‘2963’) 1, 0 dermtosiDermatophytosis-dxvar =: ‘110’ 1, 0 diab combines (diabmell, dxdiabet)1, 0 diabmell Diabetes mellitus-dxvar = :‘250’ 1, 0 dial combines(lspxdial, mcptdial) 1, 0 discdsr Intervertebral disc dsrs or dxvar =:‘722’ 1, 0 disstat DISCHARGE STATUS diverint Diverticula of intestineor dxvar = :‘562’ 1, 0 dizzi Dizziness and giddiness or dxvar = ‘7804’1, 0 dob date of birth VALID DATE dobpatn PATIENT BIRTH DATE VALID DATEdocspec DOCTOR SPECIALITY ABBR REQUIRE SOURCE INPUT doctype DOCTOR TYPEREQUIRE SOURCE INPUT drg DRG specify version drgaltst drug, alcohol,methodone usage tsts (cpt) if (‘80100’ <= cpts{i} <= ‘80103’) or numberof tests (cpts{i} eq ‘82055’) or (‘80150’ <= cpts{i} <= ‘80299’) thendrgaltst = sum(drgaltst, 1); drugdep Drug dependence and nondependentuse of drugs-‘304’ <= dxvar <= ‘3059’ 1, 0 dsranal Anal and rectal Disor ‘569’ <= dxvar <= ‘56949’ 1, 0 dsrbone dsrs of bone and cartilage or‘730’ <= dxvar <= ‘73399’ 1, 0 dsrbrst dsrs of breast-‘610’ <= dxvar <=‘6119’ 1, 0 dsrear dsrs of external ear-dxvar = : ‘380’ 1, 0 dsreyelddsrs of eyelids-‘373’ <= dxvar <= ‘3749’ 1, 0 dsrgallb dsrs of thegallbladder and biliary tract-‘574’ <= dxvar <= ‘5769’ 1, 0 dsrlipiddsrs of lipid metabolism-dxvar = : ‘272’ 1, 0 dsrmens dsrs ofmenstruation and abnormal bleeding-dxvar = : ‘626’ 1, 0 dsrrefra dsrs ofrefraction and accommodation-dxvar = : ‘367’ 1, 0 dx1/proc1 ICD-9-CMCODE specify version dx2/proc2 ICD-9-CM CODE 2 specify version dx3/proc3ICD-9-CM CODE 3 specify version dx4 ICD-9-CM CODE 4 specify version dx5ICD-9-CM CODE 5 specify version dx6-40 ICD-9-CM Diagnosis (afteraggregate) specify version dxabort DX Abortion-630” <= substr (dxvar, 1,3) <= “639 1, 0 dxblood DX Blood-“280” <= substr (dxvar, 1, 3) <= “289”1, 0 dxcircul DX Circul System-390” <= substr (dxvar, 1, 3) <= “459 1, 0dxcomp DX Complications of Care-“996” <= substr (dxvar, 1, 3) <= “999”1, 0 dxcondtn DX Condn Influence Health Status-V40” <= substr(dxvar,1,3) <= “V49 1, 0 dxcongen DX Congenital Anomaly-740” <= substr(dxvar,1,3) <= “759 1, 0 dxdiabet DX Diabetes-“250” = substr (dxvar,1,3)1, 0 dxdigest DX Digestive System-520” <= substr (dxvar,1,3) <= “579 1,0 dxdonor V59” = substr (dxvar,1,3 1, 0 dxecode DX E-Code-“E01” <=substr (dxvar,1,3) <= “E99” 1, 0 dxendocr DX Endocrine, Nutrition,Metabolic-“240” <= substr (dxvar,1,3) <= “249” or 1, 0 “251” <= substr(dxvar,1,3) <= “279” dxgu DX GU System-580” <= substr (dxvar,1,3) <=“629 1, 0 dxinfec DX Infec & Parasite-“001” <= substr (dxvar,1,3) <=“139” 1, 0 dxinjury DX Injury-“800” <= substr (dxvar,1,3) <= “959” or 1,0 “980” <= substr (dxvar,1,3) <= “959” dxlvebrn DX Liveborn-V30” <=substr (dxvar,1,3) <= “V39 1, 0 dxmental DX Mental-“290” <= substr(dxvar,1,3) <= “319” 1, 0 dxmgest DX Multiple Gestation-“651” = substr(dxvar,1,3) 1, 0 dxmskel DX Musculoskel & connect tiss-710” <= substr(dxvar,1,3) <= “739 1, 0 dxneoben DX Neoplasm Benign-210” <= substr(dxvar,1,3) <= “229 1, 0 dxneomal DX Neoplasm Malig-“140” <= substr(dxvar,1,3) <= “209” 1, 0 dxnerves DX Nervous System-“320” <= substr(dxvar,1,3) <= “359” 1, 0 dxob DX Preg, Childbirth, Puerp-630” <= substr(dxvar,1,3) <= “677 1, 0 dxperhis DX Personal History-dxvar: V10-V19 1,0 dxperntl DX Perinatal-760” <= substr (dxvar,1,3) <= “779 1, 0 dxpoisonDX Poisoning 1, 0 dxpreg DX Pregnancy-640” <= substr (dxvar,1,3) <=“649” 1, 0 or“652” <= substr (dxvar,1,3) <= “667” dxpregv DX PregnancyV-Code-V20” <= substr (dxvar,1,3) <= “V29 1, 0 dxresp DX RespSystem-460” <= substr (dxvar,1,3) <= “519 1, 0 dxsense 360” <= substr(dxvar,1,3) <= “389 dxskin DX Skin & Subcut-680” <= substr (dxvar,1,3)<= “709 1, 0 dxspecpx DX Spec Procs & Aftercare-V50” <= substr(dxvar,1,3) <= “V58 1, 0 dxsymptm DX Symptoms, Signs, & IIIDefined-“780” <= substr (dxvar,1,3) <= “799” 1, 0 dxvaccin DX DiseaseContact or Vaccine 1, 0 dxvgnldl DX Normal Delivery-“650” = substr(dxvar,1,3) 1, 0 dysp_pul combines (cyspnea, othopd) 1, 0 dyspneaDyspnea and respiratory abnormalities-dxvar= :‘7860’ 1, 0 effdte enrleff date VALID DATE encoconr Encounter for contraceptivemanagement-dxvar= :‘V25’ 1, 0 ENRLADDR1 address 1 CONFIDENTIAL enRLADDR2Address 2 CONFIDENTIAL enrlarea AREA CODE CONFIDENTIAL enrlcity cityCONFIDENTIAL enrlm1 Enrolled October 1995 1, 0 enrlm10 Enrolled July1996 1, 0 enrlm16 Enrolled January 1997 1, 0 enrlm22 Enrolled July 19971, 0 enrlm28 Enrolled January 1998 1, 0 enrlm34 Enrolled July 1998 1, 0enrlm39 Enrolled December 1998 1, 0 enrlm4 Enrolled January 1996 1, 0enrlphne phone number CONFIDENTIAL enrlst state REQUIRE SOURCE INPUTenrollee Person is Enrollee 1, 0 enrrelfm enrollee relationship enrrellsensagenc Age at end of year 0 Code, .<age < 1 then ensagenc = “<1”; SEEDESCRIPTION 1 <= age < 5 then ensagenc = “01-05”; 5 <= age < 18 thenensagenc = “05-18”; 18 <= age < 25 then ensagenc = “18-25”; 25 <= age <45 then ensagenc = “25-45”; 45 <= age < 65 then ensagenc = “45-65”; 65<= age < 80 then ensagenc = “65-80”; 80 <= age then ensagenc = “80+”;ensxkd enrrells = 1 & ensex = M > ensxkd = A; enrrells = 1 & ensex = F >ensxkd = B; A thru F enrrells = 2 & ensex = M > ensxkd = C; enrrells = 2& ensex = F > ensxkd = D; enrrells = (3, 4, 5, 6) > ensxkd = E; elseensxkd = F; entrost comines(artipost, lspxentr, lspxgast) 1, 0 epistaxEpistaxis dxvar = 7847 1, 0 esopha Esophagitis dxvar = 5301 1, 0 esshypEssential hypertension-dxvar= :‘401’ 1, 0 excamt1 EXCLUSION AMT 1excamt2 EXCLUSION AMT 2 excamt3 EXCLUSION AMT 3 excamt4 EXCLUSION AMT 4exccatg1 EXCLUSION CATG 1-CATEGORY DEF - 1-coverage inelig, 2-medical18-Jan necessity, 3-n/a, 4-deductibles, 5-coins, 6-cob, 7-medicare, 8-contract max, 9-dupicate, 10-n/a, 11-non-cov, 12-copay, 13- flexplan,14-n/a, 15-exceeds sched, 16-alt proc, 17-panel contract fee, 18-n/aexccatg2 EXCLUSION CATG 2 - see description of catg 1 see exccatg1 descexccatg3 EXCLUSION CATG 3 - see description of catg 1 see exccatg1 descexccatg4 EXCLUSION CATG 4 - see description of catg 1 see exccatg1 descexchg2a 2nd Highest month chg ADJacent to 1st’ exchg2b 2nd Highest monthchg NOT ADJacent to 1st’ exclh1 Base Year Highest Monthly Pymt Per Day’exclh1ch Base Year Highest Monthly chg Per Day’ exclh2a Baseyr ‘2ndHighest Monthly Pymt ADJacent to 1st’ exclh2b Baseyr ‘2nd HighestMonthly Pymt NOT ADJacent to 1st’ eyemix combines(cataract, lensrepl,retinldt, scpteye) 1, 0 f0105kd Female 01-05 Child 1, 0 f0518kd Female05-18 Child 1, 0 f1825en Female 18-25 Enrollee 1, 0 f1825sp Female 18-25Spouse 1, 0 f1865kd Female 18-65 Child 1, 0 f2545en Female 25-45Enrollee 1, 0 f2545sp Female 25-45 Spouse 1, 0 f4565en Female 45-65Enrollee 1, 0 f4565sp Female 45-65 Spouse 1, 0 f4580ss Female 45-80Widow 1, 0 f6580en Female 65-80 Enrollee 1, 0 f6580sp Female 65-80Spouse 1, 0 f80pen Female 80+ Enrollee 1, 0 f80psp Female 80+ Spouse,Widow 1, 0 fall Falls 1, 0 fam1p1c Family is 1 Par 1 Child 1, 0 fam1p2cpFamily is 1 Parent 2+ Children 1, 0 fam2p1cp Family is 2 Parents 1+Children 1, 0 famcoup Family is Couple 1, 0 famdau Daughter in Family 1,0 famempo Family Employee Only 1, 0 famenr Enrollee in Family 1, 0famlst trimn(famlst)||enrrells; 1, 0 famnkid # of Kids per Enrollee 1, 0famofem Oth Female in Family 1, 0 famomal Oth Male in Family 1, 0famsdau Step Daughter in Family 1, 0 famsize # Covered Lives PerEnrollee COUNT famson Son in Family 1, 0 famspse Spouse in Family 1, 0famsson Step Son in Family 1, 0 famsurv Surviving Spouse in Family; 1, 0firearm Firearm missile 1, 0 firestem Fire, flames, hot sub, object,caustic, corrosive, steam 1, 0 flt1kd Female < 1 Child 1, 0 followupFollow-up examination dxvar =: V67 1, 0 frachand Fracture of hand andfingers dxvar =: (814-8171) 1, 0 fracllim Fracture of lower limb dxvar=: (820-8291) 1, 0 fracoth oth fractures dxvar = 800-81259 or 818-81911, 0 fracrad Fracture of radius and ulna dxvar =: 813 1, 0 fracskulIntracranial injury, excluding those with skull fracture-‘850’ <= dxvar<= ‘8541’ 1, 0 gadvmed Adverse effects of medical treatment dxvar =E870-E879, E930-E9499 1, 0 gasthemm Gastrointestinal hemorrhage dxvar=:578 1, 0 gastri Gastritis and duodenitis dxvar = 535 1, 0 gblood Disof the blood and blood-forming organ-‘280’ <= dxvar <= ‘2899’-group code1, 0 of anemia, othblood gcircul Dis of the circulatory system-‘390’ <=dxvar <= ‘4599’-group code of anginap, 1, 0 arthero, othische, carddysr,chf, othheart, esshyp, cerebrov, artery, hermorrh, othcirc gconanomCongenital anomalies dxvar = 740-7599 1, 0 gdigest Dis of the digestivesystem dxvar = 520-5799 1, 0 gendo Endocrine, nutril and metab Dis, andimmunity dsrs-‘240’ <= dxvar <= ‘2799’- 1, 0 group code code-ahypothy,othhyr, diabmell, dsrlipid, obesity, othendo genmedex General medicalexamination dxvar =: V70 1, 0 ggenito Dis of the genitourinary system -GROUP OF 1, 0 OTHURIN, CALCIDY, CYSTBLDD, HYPROS, INFLFEML, OTHNOLE,DSRBRST, NINFFEM DSRMENS gibluc combines(gasthemm, stomulcr) 1, 0 gihsSuppl classif of factors influ hlth stat & contact w hlth se dxvar =V01-V829 1, 0 ginfect Infectious and parasitic Dis-‘001’ <= dxvar <=‘1398’ group code of 1, 0 strep, hivinfec, virlwart, chlamyd, dermtosi,candidia & ohtinfs ginjpoi Injury and poisoning group code fracradfrachand fracllim fracoth sprnwrst 1, 0 sprnkne sprnankl sprnnecksprnobk sprnostr fracskul owndhd owndhnd othopnwd suprcorn othspincontus oinjury poison unspex cmpms ginjudet Injuries of undeterminedintent no sub dxvar = E980-E989 1, 0 gintinj Intentional injuries -group code assault, selfinfl, voilenc, 1, 0 glacoma Glaucoma-dxvar =:‘365’ 1, 0 gmentl Mental dsrs-‘290’ <= dxvar <= ‘319’ - group code ofschizo, depress, othpsycy, 1, 0 anxiety, neurotic, alcohdep, drugdep,stress, othdepr, add & othmentl gmuscu Dis of the musculoskeletal systemand connective tissue dxvar = 710-7399 1, 0 gneoplsm Neoplasm-‘140’ <=dxvar <= ‘2399’ - group code of mn_coln, mn_skin, mn_brst, 1, 0 mn_pros,mn_lymp, mn_oth, & secondary neo's bn_skin, bn_oth, neounsp gnervous Disof the nervous system and sense organs-‘320’ <= dxvar <= ‘3899’ group 1,0 code of migraine, othcentr, carpltun, othnerv, retinldt, glacoma,cataract, dsrrefra, conjunct, dsreyeld, otheye, dsrear, otitismd, otheargperi Certain cond originating in the perinatal period NO SUBCATS-760-7799 1, 0 gpregn Comps of pregnancy, childbirth, and thepuerperium NO SUB CATS 1, 0 dxvar = 630-677 gpsorias Psoriasis andsimilar dsrs group code of oinfskin, corncal, actinseb, acne, 1, 0sepacyst, urticari, osksub gresp Dis of the respiratory system-‘460’ <=dxvar <= ‘5199’ - group code of 1, 0 acusinu, acuphary, acutonsl,acubronc, othacres, chrsinu, alerhin, chronbro, asthmas, othopd, othrespgskin Dis of the skin and subcutaneous tissue group code - celluabs,oiskin, 1, 0 conderm gsymsig Symptons, Signs, and III-defined cond groupcode - syncope, convuls, dizzi, 1, 0 pyrexi, suminteg, headach epistax,abheart, dyspnea, cough, chestpn, sympurin, abdpain, othssil gunintUnintentional injuries group code - fall, mototraf, struck, overext,cutobjs, 1, 0 natenvr, poisdrg, firestem, machinr, cycle, mototra,othtran, firearm, othclas, mechunsp gynexam Gynecologicalexamination-dxvar = ‘V723’ 1, 0 hchc hcpcs CODES 1, 0 headachHeadache-dxvar = ‘7840’ 1, 0 hemat combines (anemia, dxblood, acutonsl,gblood, othblood) 1, 0 hermorrh Hemorrhoids-dxvar =: ‘455’ 1, 0 herniabdHernia of abdominal cavity-‘550’ <= dxvar <= ‘5539’ 1, 0 Hi1dvby Theindex of Highest cost per day divided by Average cost per day per monthHi2dvby The index of 2^(nd) Highest cost per day divide by Average costper day per month Hibych2a (1, 0) 1 = The second highest month cost perday is adjacent to the first month Hibymos1 The maximum cost per day forany month cost for the base year Hibych2b (1, 0) 1 = The second highestmonth cost per day is not adjacent to the first month Hibymos2 The2^(nd) Highest cost per day for any month for the base year hiloclassify high cost nxt yr cases based on charges 0-low <96, 1-High ge 96hilopay classify high cost nxt yr cases based on payments 0-low <68.5,1-High ge 68.5 hivinfec HIV infection-dx starting w/042 1, 0 hspatri1Hosp Admit in Trimes 1 COUNT hspatri2 Hosp Admit in Trimes 2 COUNThspatri3 Hasp Admit in Trimes 3 COUNT hsptlos Total Hospital LOS DAYShsptlosc Total Hospital LOS Category hyprpros Hyperplasia ofprostate-dxvar = ‘600’ 1, 0 icu_etc combines(lspxvein, lspxvent,mcptccth, mcptintr, pcptcrit, pulart) 1, 0 infertil Any mention ofinfertility male or female (cpt) or dxvar in: (‘628’, ‘606’) 1, 0inflfeml Inflammatory dsrs of female pelvic organs-‘614’ <= dxvar <=‘6169’ 1, 0 irratcol Irritable colon-dxvar = ‘5641’ 1.0 itemno ITEM NBRjntdsrs Derangements and oth and unspecified joint dsrs-‘717’ <= dxvar<= ‘7199’ 1, 0 kid1_3 Count of the Number of Children in a family. 0 =no children, 1, 2 or 3 or more 0-3 children lensrepl Lens replaced bypseudophakos-dxvar = ‘V431’ 1, 0 locatnme LOCATION NAME CONFIDENTIALlocatno LOCATION CONFIDENTIAL logi combines (constip, diverint,othdiges) 1, 0 lspxampu Life PX Amputation cpts{i} in 1, 0 (‘23900’,‘23920’, ‘24900’, ‘25900’, ‘25927’, ‘27295’, ‘27590’, ‘27591’, ‘27592’,‘27596’, 1, 0 ‘27598’, ‘27880’, ‘27881’, ‘27882’, ‘27886’, ‘27888’,‘27889’, ‘28880’, ‘28805’) lspxchem Life PX Chegroup codeapy-cpts{i} in1, 0 (’96400’, ‘96408’, ‘96410’, ‘96412’, ‘96414’, ‘96420’, ‘96422’,‘96423’, ‘96425’, ‘96445’, ‘96450’, ‘96520’) lspxdial Life PX Dialysiscpts in ‘90935’, ‘90937’, ‘90945’, ‘90947’ 1, 0 lspxentr Life PXEnterostomy-cpts{i} in 1, 0 (‘44300’, ‘44310’, ‘44312’, ‘44314’,‘44316’, ‘44320’, ‘44322’, ‘44340’, ‘44345’, ‘44346’) lspxgast Life PXGastrostomy cpts in ‘3750’, ‘43760’, ‘43830’, ‘43832’ 1, 0 lspxorgn LifePX Major Organ Transplants-cpts{i} in (‘33935’, ‘33945’, ‘47135’,‘40260’) 1, 0 lspxradt Life PX Radiation Therapy-if cpts{i} 1, 0 in‘77261 ‘77263’, ‘77280’, ‘77285’, ‘77290’, ‘77295’, ‘77299’, ‘77300’,‘77305’, ‘77310’, ‘77315’, ‘77321’, ‘77326’, ‘77327-8, ‘77331’, ‘77336’,‘77370’, ‘77399’, ‘77401- 4’, ‘77406-9’, ‘77411-4’, ‘77416-20’, ‘77425’,‘77430- 2’, ‘77470’, ‘77499’, ‘77000’, ‘77605’, ‘77610’, ‘77615’,‘77620’, ‘77750’, ‘77761’- 3’, ‘77776-8’, ‘77781-4’, ‘77789’, ‘77790’,‘77799’ lspxtrch Life PX Tracheostomy-cpts{i} in (‘31600’, ‘31603’,‘31610’) 1, 0 lspxvein Life PX Venous Access Port-cpts{i} in (‘36495’,‘36496’) 1, 0 lspxvent Life PX Intubation/Ventilation-cpts in (‘31500’,‘94650’, ‘94651’, ‘94656’, ‘94657’ 1, 0 lumbago Lumbago dxvar = ‘7242’1, 0 m0105kd Male 01-15 Child 1, 0 m0518kd Male 05-18 Child 1, 0 m1825enMale 18-25 Enrollee 1, 0 m1845sp Male 18-45 Spouse 1, 0 m1865kd Male18-65 Child 1, 0 m2545en Male 25-45 Enrollee 1, 0 m4565en Male 45-65Enrollee 1, 0 m4565sp Male 45-65 Spouse 1, 0 m6580en Male 65-80 Enrollee1, 0 m6580sp Male 65-80 Spouse 1, 0 m80pen Male 80+ Enrollee 1, 0 m80pspMale 80+ Spouse, 65+ Widower 1, 0 machinr Machinery-dxvar =: ‘E919’ 1, 0male 1, 0 mcptallr Med CPT Allergy, “95004” <= cpts{i} <= “95199” 1, 0mcptcard Med CPT Cardiogr, “93000” <= cpts{i} <= “93350” 1, 0 mcptcarvMed CPT CardVascThor, “92950” <= cpts{i} <= “92996” 1, 0 mcptccth MedCPT CardCath, “93501” <=cpts{i} <= “93572” 1, 0 mcptchem Med CPTChegroup code, “96400” <= cpts{i} <= “96549” 1, 0 mcptcns Med CPT CNS,“96100” <= cpts{i} <= “96117” 1, 0 mcptderm Med CPT Dermatology, “96900”<= cpts{i} <= “96999” 1, 0 mcptdial Med CPT Dialysis, “90918” <= cpts{i}<= “90999” 1, 0 mcptent Med CPT ENT, “92502” <= cpts{i} <= “92599” 1, 0mcptintr Med CPT IntraCard, “93600” <= cpts{i} <= “93660” 1, 0 mcptneurMed CPT Neurology, “95805” <= cpts{i} <= “95975” 1, 0 mcptopth Med CPTOpthalm, “92002” <= cpts{i} <= “92499” 1, 0 mcptoste Med CPT OsteoPath,“98926” <= cpts{i} <= “98929” 1, 0 mcptphys Med CPT PhysTher, “97010” <=cpts{i} <= “97999” 1, 0 mcptpsy Med CPT Psych, “90801” <= cpts{i} <=“90899” 1, 0 mcptpulm Med CPT Pulmon, “94010” <= cpts{i} <= “94799” 1, 0mcptvasc Med CPT VascStudy, “93875” <= cpts{i} <= “93980” 1, 0 mechunspMechanism unspecified 1, 0 menopa Menopausal and postmenopausal dsrsdxvar=:627 1, 0 migraine Migraine-dxvar = : ‘346’ 1, 0 misc_hrt combines(arthero, carddysr, mcptccth, mcptintr, othische) 1, 0 mlt1kd Male < 1Child 1, 0 mn_brst Malignant neoplasm of breast-‘174’ <= dxvar <=‘1759’) or (dxvar = ‘19881’) 1, 0 mn_coln Malignant neoplasm of colonand rectum-(‘153’ <= dxvar <= ‘1548’) or 1, 0 (dxvar = ‘1975’) mn_lympMalignant neoplasm of lymphatic and hematopoietic tissue-dxvar in(‘1765’, 1, 0 ‘1969’)) or (‘200’ <= dxvar <= ‘20891’) mn_oth othmalignant neoplasm-(‘140’<= dxvar <= ‘1529’) or (‘155’-‘1719’)or(‘1761’-‘1764’) 1, 0 or (‘1766’-‘1849’) or (‘186’-‘1958’) or(‘197’-‘1974’) or (‘1976’-‘1981’) or (‘1983’-‘1987’) or (‘19882’-‘1991’)or (‘230’-‘2349’) or dxvar = ‘1988’ mn_pros Malignant neoplasm ofprostate-dxvar = ‘185’ 1, 0 mn_skin Malignant neoplasm of skin-(‘172’ <=dxvar <= ‘1739’) or 1, 0 (dxvar in (‘1760’, ‘1982’)) MOSA thru F dummiesfor CATBMOS 1, 0 motontra Motor vehicle, nontraffic- 1, 0 dx(‘E8200’,‘E8210’, ‘E8220’, ‘E8230’, ‘E8240’, ‘E8250’, ‘E8205’, ‘E8215’, ‘E8225’,‘E8235’, ‘E8245’, ‘E8255’, ‘E8207’, ‘E8217’, ‘E8227’, ‘E8237’, ‘E8247’,‘E8257’, ‘E8209’, ‘E8219’, ‘E8229’, ‘E8239’, ‘E8249’, ‘E8259’) mototrafMotor vehicle, traffic-‘E810’ <= dxvar <= ‘E8199’ 1, 0 mrh1drg 1st MostRecent Hosp DRG DRG mrh1los 1st Most Recent Hosp LOS DAYS mrh1mdc 1stMost Recent Hosp MDC MDC mrh1ms 1st Most Recent Hosp Medsurg Medicalsurgical indicator mrh2drg 2nd Most Recent Hosp DRG DRG mrh2los 2nd MostRecent Hosp LOS DAYS mrh2mdc 2nd Most Recent Hosp MDC MDC mrh2ms 2ndMost Recent Hosp Medsurg Medical surgical indicator mrh3drg 3rd MostRecent Hosp DRG DRG mrh3los 3rd Most Recent Hosp LOS DAYS mrh3mdc 3rdMost Recent Hosp MDC MDC mrh3ms 3rd Most Recent Hosp Medsurg Medicalsurgical indicator mxchgtri replaces chgt1-t3 and uses index 1, 2, 3mylagi Myalgia and myositis, unspecified-dxvar = ‘7291’ 1, 0 namefirFIRST NAME confidential namelast LAST NAME confidential namemidl MIDDLEINITIAL confidential natenvr Natural and environmental factors-(‘E900’<= dxvar <= ‘E9099’) or (‘E9280’ <= 1, 0 dxvar <= ‘E9282’) ncpt9xc # of9xxxx cpts in year category count ncpt9xxx # of 9xxxx cpts in year countneounsp Neop of uncertain behavior and unspec nature-‘235’ <= dxvar <=‘2399’ 1, 0 nervsys combines (gneoplsm, othcentr) 1, 0 netwkcd NETWORKCODE confidential netwknme NETWORK NAME confidential neurotic Neuroticdepression-dxvar = ‘3004’ 1, 0 newchg10 Mean of BaseChg months minus 2highest months' newpay10 Mean of BasePay months minus 2 highest months'nhosps # of hosp visits count nhospsc # of hosp visits Category ninenterNoninfectious enteritis and colitis ‘555’ <= dxvar <= ‘5589’ 1, 0ninffem Noninflammatory dsrs of female genital organs dxvar = 622-62491, 0 nobasepy ‘basechg without payment’ 1, 0 noclaims No Claims in Baseor Study Period 1, 0 normpreg Normal pregnancy 1, 0 numagegp 0 <=ensagen <= 0.9 numagegp = 1; 1 <= ensagen <= 4.9 numagegp = 2; 1 thru 115.0 <= ensagen <= 17.9 numagegp = 3; 18 <= ensagen <= 24.9 numagegp = 4;25 <= ensagen <= 34.9 numagegp = 5; 35 <= ensagen <= 44.9 numagegp = 6;45 <= ensagen <= 54.9 numagegp = 7; 55 <= ensagen <= 64.9 numagegp = 8;65 <= ensagen <= 74.9 numagegp = 9; 75 <= ensagen <= 84.9 numagegp = 10;ensagen ge 85 numagegp = 11; obesity Obesity-dxvar = : ‘2780’ 1, 0obseval Observation and evaluation for suspected cond not found dxvar =: v71 1, 0 oinfskn other inflammatory condition of skin and subcutaneoustissue dxvar = 690-6918, 1, 0 693-6959, 697-6989 oinjury oth injuries 1,0 oiskin oth infection of the skin and subcutaneous tissue 1, 0 omusccnoth Dis of the muscutoskeletal system and connective tissuedxvar-734-7399 1, 0 osksub oth dsrs of the skin and subcutaneoustissue-dxvar: 7028, 709, 703-7059, 1, 0 7063-7079 ostealldOsteoarthrosis and allied dsrs-dxvar: 715 1, 0 othacres oth acuterespiratory infections-(dxvar = ‘460’) or (‘464’ <= dxvar <= 1, 0‘4659’) otharth oth arthropathies and related dsrs-dxvar 710-7138,7141-7149, :716 1, 0 othblood oth Dis of the blood and blood-formingorgans-‘286’ <= dxvar <= ‘2899’ 1, 0 othcentr oth dsrs of the centralnervous system-(‘320’ <= dxvar <= ‘326’) or (‘330’ <= 1, 0 dxvar <=‘3379’) or (‘340’ <= dxvar <=‘3459’) or (‘347’ <= dxvar <= ‘3499’)othcirc oth Dis of the circulatory system-(dxvar IN (‘390’, ‘3929’,‘403’, ‘405’, ‘417’)) 1, 0 or (‘451’-‘4549’) or (‘456’-‘4599’) othclasoth and not elsewhere classified-dxvar E925-E9269, E988, E9290-E929,E925-E9269, 1, 0 E9288, E9290-E929 othdepr Depressive reaction, notelsewhere classified-dxvar = ‘311’ 1, 0 othdiges oth Dis of thedigestive system-DXVAR 526-5300, 5302-5309, 536-5439, 5642-5649, 1, 0567-5689, 5695-5739, :(560, 577, 579) othdorso oth dorsopathies-DXVAR720-72191, 723-7241, 7243-7249 1, 0 othear oth Dis of the ear andmastoid process-‘383’ <= dxvar <= ‘3899’ 1, 0 othendo oth endocrine,nutrit and metabolic Dis, and immunity dsrs- 1, 0 (‘251’ <= dxvar <=‘2719’) or (‘273’ <= dxvar <= ‘2779’) or (‘2781’<= dxvar <= ‘27903’)otheye oth dsrs of the eye and adnexa-(dxvar = :‘360’) or(‘363’-‘3649’)or (‘368’-‘3699’) 1, 0 or (‘370’-‘3719’) or(‘3724’-‘3729’) or (‘375’-‘3799’) otheye (dxvar = :‘360’) or (‘363’ <=dxvar <= ‘3649’) or (‘368’ <= dxvar <= 1, 0 ‘3699’) or (‘370’ <= dxvar<= ‘3719’) (‘3724’ <= dxvar <= ‘3729’) or (‘375’ <= dxvar <= ‘3799’)othfeml oth dsrs of the female genital tract-DXVAR 617-6199, 621, 625628, 629 1, 0 othhealt oth factors influencing hlth stat and contactwith hlth serv-DXVAR V200-201, 1, 0 :V21, V290-V430, V432-V389,V46-V669, V68-V699, V720-V722, V724-V829 othheart oth heartdisease-(‘391’ <= dxvar <= ‘3920’) or (‘393’-‘39899’) or (dxvar IN 1, 0:(‘402’, ‘404’)) or (‘415’-‘4169’)or (‘420’-‘4269’) or (‘4281’-‘4299’)othinfs oth infectious and parasitic disease-(‘001’ <= dxvar <= ‘0339’)or (‘0341’-‘0419’) 1, 0 or (‘045’-‘0780’) or (‘0782’-‘07981’) or(-‘07999’) or (‘080’-‘1049’) or (dxvar= :‘111’) or (‘114’-‘1398’)othische oth ischemic heart disease-DXVAR 410-412, 4141-4149 1, 0othmale oth dsrs of male genital organs-DXVAR 601-6089 1, 0 othmentl othmental dsrs-(‘312’ <= dxvar <= ‘3139’) or (‘3141’-‘319’) or(‘3001’-‘3003’) 1, 0 or (‘3005’-‘3009’) or (‘301’-‘3026’) or(‘306’-‘3079’) or (dxvar =: ‘310’) othnerv oth dsrs of the nervoussystem-(‘350’ <= dxvar <= ‘3539’) or (‘3541’ <= 1, 0 dxvar <= ‘3599’)othopd oth COPD and allied cond-DXVAR 492, 494-496 1, 0 othopnwd othopen wound-DXVAR 874-8812, 884-8977 1, 0 othpsych oth psychoses-(‘290’<= dxvar <= ‘2949’) or (‘2960’ <= dxvar <= ‘2961’) 1, 0 or (‘2964’ <=dxvar <= ‘2999’) othrepro oth encounter related toreproduction-V23-V242, V26-V289 1, 0 othresp oth Dis of the respiratorysystem-470-4722, 474-4761, 478, 4780, 4781, :487, 1, 0 500-5199 othrhexboth rheumatism, excluding back-DXVAR 725, 7271-7279, :728, :7290,7292-7299 1, 0 othspin oth superficial injury-DXVAR 910-9180, 9182-91991, 0 othssil oth symptoms, signs, and ill-defined cond-DXVAR 7800-7801,1, 0 :(7805, 781, 783, 7861), 7807-7809, 7841-78469, 7848-7849,7854-7859, 7863-7864, 7866-78799, 7891-7999 oththyr oth dsrs of thethyroid gland-(‘240’ <= dxvar <= ‘243’) or 1, 0 (‘245’ <= dxvar <=‘2469’) othtran oth transportation dxvar FOR E800X-E807X WHEN X EQ 0, 2,8 OR 9 1, 0 othtype Other genetic typing tsts for transplants cpts{i} in1, 0 (‘86805’, ‘86806’, ‘86807’, ‘86808’, ‘86821’, ‘86822’, ‘86849’)othurin oth Dis of the urinary system-580-5899, 590-591, 593-5949,597-5989, 5991-5999 1, 0 otitismd Otitis media and Eustachian tubedsrs-‘381’ <= dxvar <= ‘3829’ 1, 0 ounspex oth and unspecified effectsof external causes DXVAR 990-99589 1, 0 overext Overexertion andstrenuous movements DXVAR E927 1, 0 owndhd Open wound of head-DXVAR870-8739 1, 0 owndhnd Open wound of hand and fingers DXVAR 882-8832 1, 0pay Payment in Base Year payment PAYMENT AMT payp Payment in PredictionYear pbynoby ‘prebase, base 1, 0 pbyothr ‘prebase, base >0’ 1, 0pcptborn CPT Place Newborn-99431” <= cpts{i} <= “99490 1, 0 pcptcons CPTPlace Consult-99241” <= cpts{i} <= “99275 1, 0 pcptcrit CPT PlaceCritical Care-99291” <= cpts{i} <= “99292 1, 0 pcpter CPT Place ER99281” <= cpts{i} <= “99288 1, 0 pcpthome CPT Place Home-99341” <=cpts{i} <= “99353 1, 0 pcpthosp CPT Place Hosp-99217” <= cpts{i} <=“99238 1, 0 pcptnicu CPT Place Neon ICU-99295” <= cpts{i} <= “99298 1, 0pcptnurs CPT Place Nurs Facil 99301” <= cpts{i} <= “99313 1, 0 pcptoffCPT Place Office“99201” <= cpts{i} <= “99215” 1, 0 pcptoltf CPT PlaceOth LTCF-99321” <= cpts{i} <= “99333 1, 0 pcptpmed CPT Place PrevMed-99381” <= cpts{i} <= “99429 1, 0 penvasc combines (artery, mcptvasc,othcirc) 1, 0 periph Peripheral enthesopathies and allied dsrs-DXVAR:726 1, 0 pershyst Potential health hazards related to personal andfamily hist-DXVAR V10-V198 1, 0 pharclms Pharmacy Claims count plannamePLAN NAME confidential planno SERIAL-PLAN NBR confidential pmtchg base &(pmt/basechg ge .2) as 1 1, 0 pneumon Pneumonia-DXVAR 480-486 1, 0poisdrg Psning drugs, med subst, biolog, oth solid, liqd, gases, vapor1, 0 poison Poisonings-DXVAR 960-9899 1, 0 postpart Postpartum care andexamination-DXVAR: V24 1, 0 prenatal Undelivered Pregnancy-Prenatal care1, 0 prgage age It35 (1), 0 1, 0 provlocn PROVIDER LOCATION confidentialprovname PROVIDER NAME confidential provnetw PROVIDER NETWORKconfidential provno PROVIDER NBR confidential provst PROVIDER STATEconfidential provtype PROVIDER TYPE confidential pulart pulmonary arterycath placement cpts{i} eq ‘93503’ 1, 0 pyrexi Pyrexia of unknown origin:7806 1, 0 rad combines (Ispxradt, radther, radnuc) 1, 0 radnuc NuclearMedicine cpts{i} starting with (‘78’, ‘79’) 1, 0 radther Any radiationtherapy cpts{i} starting with ‘77’ 1, 0 relation RELATIONSHIP 1-9 ‘1’ =‘A Enrollee’ ‘2’ = ‘B Spouse’ ‘3’ = ‘C Son’ ‘4’ = ‘D Daughter’ ‘5’ = ‘EStepson’ ‘6’ = ‘F Stepdaughter’ ‘7’ = ‘G Other Male’ ‘8’ = ‘H OtherFemale’ ‘9’ = ‘I Surv Spouse’ retinldt Retinal detachment and othretinal dsrs-‘361’ <= dxvar <= ‘3629’ 1, 0 rheuarth Rheumatoidarthritis-DXVAR 7140 1, 0 routchk Routine infant or child healthchecks-DXVAR V202 1, 0 schizo Schizophrenic dsrs-dxvar = ‘295’ 1, 0scptaudi Surg CPT Auditory, “69” = substr(cpts{i}, 1, 2) 1, 0 scptbabySurg CPT Matern, “59” = substr(cpts{i}, 1, 2) 1, 0 scptcard Surg CPTCard Vasc, “33” <= substr(cpts{i}, 1, 2) <= “37” 1, 0 scptdgst Surg CPTDigest, “40” <= substr(cpts{i}, 1, 2) <= “49” 1, 0 scptdiap Surg CPT MED& Diaphr, “39” = substr(cpts{i}, 1, 2) 1, 0 scptendo Surg CPT Endocrine,“60” = substr(cpts{i}, 1, 2) 1, 0 scpteye Surg CPT Eye, “65” <=substr(cpts{i}, 1, 2) <= “68” 1, 0 scptfem Surg CPT Lap/Perit/HystFemale Genital, “56” <= substr(cpts{i}, 1, 2) <= “58” 1, 0 scpthern SurgCPT Hernia & Lymph, “38” = substr(cpts{i}, 1, 2) 1, 0 scptmale Surg CPTMale Genital, “54” <= substr(cpts{i}, 1 ,2) <= “55” 1, 0 scptmskl SurgCPT Muscular-Skeleton, “20” <= substr(cpts{i}, 1, 2) <= “29” 1, 0scptnrve Surg CPT Nerve, “61” <= substr(cpts{i}, 1, 2) <= “64” 1, 0scotresp Surg CPT Respiratory, “30” <= substr(cpts{i}, 1, 2) <= “32” 1,0 scptskin Surg CPT Integument, “10” <= substr(cpts{i}, 1, 2) <= “19” 1,0 scpturin Surg CPT Unirnary, “50” <= substr(cpts{i}, 1, 2) <= “53” 1, 0selfinfl Self-inflicted-dxvar e950-e959 1, 0 selmalig combines(mn_coln,mn_lymph, mn_oth, mn_pros) 1, 0 sepacyst Sebaceous cyst-dxvar 7062 1, 0servamb Serv Locn Ambulance-servlocn = “11” 1, 0 servasrg Serv LocnAmbul Surg-servlocn = “16” 1, 0 servecc Serv Locn EM Care Ctr-servlocn =“09” 1, 0 servehsp Serv Locn Emerg Hosp-servlocn = “07” 1, 0 servhmhlServ Locn Home Hlth servlocn = “12” 1, 0 servhome Serv LocnHome-servlocn = “04” 1, 0 serviane Serv Locn Inpat Anes-servlocn = “15”1, 0 servih Serv Locn Inpat Hosp-servlocn = “01” 1, 0 servilab Serv LocnIndep Lab-servlocn = “08” 1, 0 servlocn SERVICE LOCATION 00-16 servnursServ Locn Nurs Home-servlocn = “05” 1, 0 servoane Serv Locn OutpatAnes-servlocn = “14” 1, 0 servoff Serv Locn Office-servlocn = “03” 1, 0servoh Serv Locn Outpat Hosp-servlocn = “02” 1, 0 servothl Serv LocnOther locn-servlocn = “10” 1, 0 servphar Sent Locn Pharmacy-servlocn =“13” 1, 0 servsnf Serv Locn SNF-servlocn = “06” 1, 0 servtype SERV TYPElist provided by source sex sex 1, 2, 9 sexpatn PATIENT SEX 1, 2, 9sprnankl Sprains and strains of ankle dxvar-: 8450 1, 0 sprnkne Sprainsand strains of knee and leg dxvar-: 844 1, 0 sprnneck Sprains andstrains of neck-dxvar 8470 1, 0 sprnobk oth sprains and strains of backdxvar: 846, 8471-8479 1, 0 sprnostr oth sprains and strainsnos-840-8419, :(843, 8451, 848, :842) 1, 0 sprnwrst Sprains and strainsof wrist and hand 1, 0 sqech1 ‘square high chg’ 1, 0 sqech2a ‘Square adjchg’ 1, 0 sqech2b Square not ADJacent chg’ 1, 0 sqexc2a ‘Square adj pay’1, 0 sqexc2b Square not ADJacent pay’ 1, 0 sqexch1 ‘square high pay’ 1,0 sqnewchg ‘Square 10mos Bchg’ 1, 0 sqnewpy ‘Square 10mos Bpay’ 1, 0 ssnEnrollee SS number 1, 0 statact Enrollment Type Active, status = ‘00’ 1,0 statcobr Enrollment Type Cobra, status = 15, 16, 17, 18, 19 1, 0statlife Enrollment Type Life Only 1, 0 statltd Enrollment Type LTD,status = 50 1, 0 statmult Enrollment Type Multiple - if 1, 0sum(statact,statss,statpens,statltd,statcobr,statlife)>1 then statmult =1; statpens Enrollment Type Pensioner-status = 10 1, 0 statss EnrollmentType Surv Spouse-status = 01 1, 0 status STATUS, ‘00 Active’, ‘01 SurvSpouse’, ‘10 Pensioner’, ‘12 LTD’, ‘15 Cobra’, ‘16 Cobra’, ‘17 Cobra’,‘18 Cobra’, ‘19 Cobra’, ‘50 Life Only’ stomulcr Ulcer of stomach andsmall intestine-531-5349 1, 0 strep Streptococcal sore throat-dxo340 1,0 stress Acute reaction to stress and adjustment reaction-‘308’ <= dxvar<= ‘3099’ 1, 0 struck Striking against or struck accidentally by objectsor person 1, 0 suprcorn Superficial injury of cornea-dxvar 9181 surgpathsurgical path levels 4, 5, 6 cpt in (‘88305’, ‘88307’, ‘88309’ 1, 0syminteg Symptoms involving skin and oth integumentary tissue dxvar: 7821, 0 sympurin Symptoms involving urinary systemdxvar: 788 1, 0 syncopeSyncope and collapse dxvar 7802 1, 0 synovit Synovitis and tenosynovitisdxvar: 7270 1, 0 teeth Dis of the teeth and supporting structures-dxvar520-5259 1, 0 temppace Temporary pacer placement cpts{i} in (‘33210’,‘33211’) 1, 0 tenmoch Average from the sum of all months in the baseyear excluding the 2 highest months per day transfus Transfusion Medical‘86850’ <= cpts{i} <= ‘86999’ 1, 0 trantype Transplant donor and genetictyping ‘86812’ <= cpts{i} <= ‘86817’ 1, 0 units UNITS COUNT urticariUrticaria dxvar: 708 1, 0 uti_unsp Urinary tract infection, site notspecified dxvar 5990 1, 0 violenc oth causes of violence dxvarE970-E978, E990-E999 1, 0 virlwart Viral warts-dxvar =: ‘0781’ 1, 0wbasechg ‘basechg present’ 1, 0 wbasepy ‘basepmt present’ 1, 0 zipenrlENROLLEE ZIP CODE 5 digitl zipprov PROVIDER ZIP CODE 5 digitConfidential information may be encrypted to protect the identity andprivacy of individuals

1. A computer-implemented process of developing a person-level costmodel for forecasting future costs attributable to claims from membersof a book of business, where person-level data regarding actual baseperiod health care claims are available for a substantial portion of themembers of the book of business for an actual underwriting period, andthe forecast of interest (i.e., future claim amount) is for an actualpolicy period which can be, but is not necessarily contiguous with theactual underwriting period, comprising the steps of: providingdevelopment universe data comprising person-level enrollment data,historical base period health care claims data and historical nextperiod claim amount data for a statistically meaningful number ofindividuals, where the person-level data on a health care claimcomprises at least a claim code and a claim amount; providing at leastone claim-based risk factor for each historical base period claim basedon the claim code associated with the health care claim and providing atleast one enrollment-based risk factor based on the enrollment data; anddeveloping a cost forecasting model by capturing the predictive abilityof the main effects and interactions of claim based risk factors andenrollment-based risk factors, with the development universe datathrough the application of an interaction capturing technique to thedevelopment universe data.
 2. The computer-implemented process of claim1, wherein the interaction capturing technique is selected from thegroup consisting of median regression tree techniques, least squareregression tree techniques, rule induction techniques, ordinary leastsquares regression techniques, median regression techniques, robustregression techniques, genetic algorithms, rule induction, clusteringtechniques and neural network techniques.
 3. The computer implementedprocess of claim 1 wherein the person-level next period cost forecastsare adjusted by modifying the extant cost forecast by the expected costtrend.
 4. The computer implemented process of claim 1 wherein the datumfrom the claims used as predictors consist essentially of the claim- andenrollment-based risk factors and the claim amount is a standardizedcost of services provided and the model is used to allocate prospectivepayments to health care providers.
 5. The computer implemented processof claim 1 wherein the data used from the claims data consistessentially of the claim code and selected mandatory procedures and theclaim amount is a standardized cost of services provided during the sametime period as the base period and the model is used to evaluate theefficiency of health care providers.
 6. The computer implemented processof claim 1, further comprising a computer implemented process offorecasting future claim amounts attributable to claims from members ofa book of business for an actual policy period, wherein the modeldevelopment universe comprises data from the members of a book ofbusiness to be insured, further comprising: applying thecost-forecasting model to the actual underwriting period person-leveldata of each of the members of the book of business to generate aperson-level actual policy period cost forecast for each member of thebook of business; and producing a group-level forecast for the actualunderwriting period from the person-level forecasts of each member ofthe group by totaling the person-level actual policy period costforecasts for the group for the policy period.
 7. The computerimplemented process of claim 6, comprising in addition the step of:setting insurance reserves based on group-level forecast for the actualpolicy period, wherein the policy period is a reserving period forclaims that have not occurred or that have occurred but not beenreported.
 8. The computer implemented process of claim 6, wherein claimamounts are a mix of fee for service payments and capitation payments sothat the base and underwriting periods risk factors are appended toinclude dummy variables for the presence of capitation payments byprovider type and the cost estimate in the next and policy periods isthe fee for service cost that must be supplemented with the expectedcapitation payments.
 9. The computer implemented process of claim 6,wherein the cost forecast is produced for first-dollar health insurance.10. The computer implemented process of claim 6, wherein the costforecast is produced for specific plus aggregate stop loss healthinsurance.
 11. The computer implemented process of claim 10, wherein thecost forecast produced is for aggregate-only stop loss health insurance.12. The computer implemented process of claim 10, wherein the costforecast produced is for specific stop loss health insurance.
 13. Thecomputer implemented process of claim 1, wherein each of the diagnosisand CPT based risk factors is independent of the sequence in time of theother diagnosis and CPT based risk factors.
 14. The computer implementedprocess of claim 1, wherein the providing of risk factors for the healthcare claim data is substantially free of human expert interaction. 15.The computer implemented process of claim 1, wherein capturing thepredictive ability of the main effects and interactions of claim basedrisk factors and enrollment-based risk factors is substantially free ofhuman expert interaction.
 16. The computer implemented process of claim1, comprising in addition the step of: setting medical insurancereserves through application of the health care cost forecasting model,wherein the next period is a reserving period for claim amounts thathave not occurred or that have occurred but not been reported.
 17. Thecomputer implemented process of claim 1 for forecasting short termdisability (STD) costs wherein a dependent measure for generating thecost forecasting model is the number of STD days in the policy periodand is weighted by the expected cost per day for the STD to produce theperson-level forecast STD costs and summed across the group to producethe group's forecast STD cost.
 18. The computer implemented process ofclaim 1, for forecasting a probability of long term disability (LTD)claims wherein a dependent measure for generating the cost forecastingmodel is the probability of a LTD claim in the policy period where theprobability is weighted by the net present value of the LTD claim amountand comprises in addition producing person-level expected LTD costs andsumming person-level expected LTD costs across the group to produce agroup's expected LTD cost.
 19. The computer implemented process of claim1 for forecasting group term life insurance costs wherein a dependentmeasure for generating the forecasting model is the expected probabilityof death weighted by the amount of life insurance to produce theperson-level expected term life insurance cost which is summed acrossthe group to produce the group's expected term life insurance cost. 20.The computer implemented process of claim 1, wherein claim amounts are amix of fee for service payments and capitation payments so that the baseand underwriting periods risk factors are appended to include dummyvariables for the presence of capitation payments by provider type. 21.A computer-implemented process of developing a hybrid person-levelhealth care claim cost forecasting model for forecasting future medicalcosts attributable to health care claims from members of a book ofbusiness, where person-level data are available for a substantialportion of the members of the book of business, comprising the steps of:providing development universe data comprising person-level data for astatistically meaningful number of individuals, the person-level datacomprising continuous variable data and categorical variable data;processing first the continuous variable data for each individual with acontinuous processing technique that captures the predictive ability ofmain effects and interactions of continuous variables to generate aperson-level continuous variable model; and processing the categoricalvariable data for each individual including the output from thecontinuous processing technique with a categorical processing techniquethat captures the predictive ability of main effects and interactions ofcategorical variables to generate a person-level categorical variablemodel; wherein the person-level continuous variable model andperson-level categorical variable model together comprise a hybridperson-level health care claim amount forecasting model.
 22. Thecomputer-implemented process of claim 21, wherein the continuousvariable data comprises data selected from the group consisting of age,length of prior enrollment, historical claim amounts and transformationsand trends in the person level claim amounts.
 23. Thecomputer-implemented process of claim 21, wherein the categoricalvariable data comprises data selected from the group consisting ofclinical risk factors, provider type and site of care.
 24. Thecomputer-implemented process of claim 21, wherein the continuousprocessing technique is selected from the group consisting of regressiontechniques and neural network techniques.
 25. The computer-implementedprocess of claim 21, wherein the categorical processing technique isselected from the group consisting of median regression tree techniques,least square regression tree techniques, rule induction techniques, andneural network techniques.
 26. The computer-implemented process of claim21, wherein the person-level data is available for a substantial portionof the members of the book of business for an actual underwritingperiod, and the claim amount of interest for forecasting purposes areduring an actual policy period which can be, but is not necessarilycontiguous with the actual underwriting period, and the developmentuniverse data comprises person-level data for each individual for ahistorical base period and a historical next period.
 27. Thecomputer-implemented process of claim 21, wherein the hybridperson-level health care claim cost forecasting model is used as aninput into an interaction capturing technique that uses all of the riskfactors that were meaningful in the hybrid person-level health careclaim cost forecasting model to forecast future medical claim amounts.28. A computer-implemented process of developing a claim amountforecasting model for use in forecasting the future claim amount formembers of a book of business, where person-level data are available fora substantial portion of the members of the book of business for anactual base period, and the claim amount of interest for forecastingpurposes is an actual next period which can be, but is not necessarilycontiguous with the actual base period, comprising the steps of:processing the base period data having claims to generate ahaving-claims claim amount forecasting model; and processing the baseperiod data without claims to generate a without-claims claim amountforecasting model, wherein the having-claims cost forecasting model andthe without-claims forecasting model comprise a claim amount forecastingmodel.
 29. A computer-implemented process of developing a health careclaim amount forecasting model for use in forecasting the future medicalclaim amount for members of a book of business, where person-level dataare available for a substantial portion of the members of the book ofbusiness for an actual base period, and the claim amount of interest forforecasting purposes is an actual next period which can be, but is notnecessarily contiguous with the actual base period, comprising the stepsof: providing development universe data comprising person-level data fora statistically meaningful plurality of individuals, wherein theperson-level data for an individual comprises health care claims datafor the individual and the data on a health care claim comprises atleast a claim amount and a claim code; Winsorizing the person-level datato yield inlier data and outlier data; processing the inlier data togenerate an inlier cost forecasting model; and processing the outlierdata to generate an outlier cost forecasting model; wherein thecombination of the results of the inlier and outlier cost forecastingmodels together produce a person-level claim amount forecast model. 30.The computer-implemented process of claim 29 further comprising:Winsorizing the inlier data to yield inlier data having claims andinlier data without claims; processing the inlier data having claims togenerate an inlier-having-claims claim amount forecasting model; andprocessing the inlier data without claims to generate aninlier-without-claims claim amount forecasting model, wherein theinlier-having-claims cost forecasting model and theinlier-without-claims forecasting model comprise an inlier claim amountforecasting model.
 31. A computer-implemented process of forecasting aclaim amount attributable to claims from members of a book of businessduring an actual policy period, comprising the steps of: providingperson-level data, comprising enrollment data for members of a book ofbusiness to be insured for an actual underwriting period that can be,but is not necessarily, contiguous with the actual policy period;providing a model development universe of person-level data, comprisingenrollment data from the historical base period and historical nextperiod heath care claims data for a statistically meaningful number ofindividuals; providing enrollment-based risk factors for each historicalbase period and providing next period claim amounts; developing a healthcare cost-forecasting model for the enrollment data by capturing thepredictive ability of main effects and interactions of enrollment-basedrisk factors through the application of an interaction capturingtechniques to the model development universe; applying the health carecost-forecasting model to the person-level underwriting periodenrollment data of each of the members of the book of business togenerate a person-level expected cost forecast for the policy period foreach member of the book of business; and producing a group-levelforecast for the expected cost of the policy period from theperson-level forecasts of each person of the group by totaling theperson-level expected cost forecasts for the actual policy period.
 32. Acomputer-implemented process of forecasting costs attributable to claimsfrom members of a book of business during an actual policy period,comprising the steps of: providing person-level data, comprisingenrollment data and actual underwriting period health care claims data,for members of a book of business, where the person-level data on ahealth care claim comprises at least a claim amount and a claim code andthe actual underwriting period can be, but is not necessarily,contiguous with the actual policy period; providing a model developmentuniverse of person-level data, comprising enrollment data, historicalbase period health care claims data and historical next period claimamount data for a statistically meaningful number of individuals, wherethe person-level data on a base period health care claim includes atleast a claim amount and a claim code; providing claim-based riskfactors for each historical base period based on the claim codeassociated with the health care claim and providing at least oneenrollment risk factor based on the enrollment data; developing acost-forecasting model by capturing the predictive ability of maineffects and interactions of risk factors through the application of aninteraction capturing technique to the model development universe;applying the cost-forecasting model to the person-level data of each ofthe individuals or members of a group to generate a person-level actualpolicy period expected cost forecast for each member of the group; andproducing a group-level forecast for the actual policy period from theperson-level forecasts of each individual or member of the group bytotaling the person-level cost forecasts for the actual policy period.33. The computer implemented process of claim 32, comprising in additionthe step of: setting claim amount reserves based on the individual orgroup-level forecast, wherein the next period is a reserving period forclaims that have not occurred or that have occurred but not beenreported.
 34. The computer implemented process of claim 32 forforecasting short term disability costs wherein the interactioncapturing technique uses a dependent measure from the next period andpolicy period comprising the number of STD days in the policy period andweights the dependent measure by the expected cost per day for the STDto produce the person-level expected STD costs and summed across thegroup to produce the group's expected STD cost.
 35. The computerimplemented process of claim 32, for forecasting a probability of longterm disability (LTD) claims wherein a dependent measure for generatingthe cost forecasting model is the probability of a LTD claim in thepolicy period where the probability is weighted by the net present valueof the LTD and applying the cost forecasting model to the person-leveldata produces person-level expected LTD costs wherein summing theperson-level expected LTD costs across the group to produce a group'sexpected LTD cost for an actual policy period.
 36. The computerimplemented process of claim 32, wherein the cost forecast is producedfor first-dollar health insurance.
 37. The computer implemented processof claim 32, wherein the cost forecast is produced for specific plusaggregate stop loss health insurance.
 38. The computer implementedprocess of claim 32, wherein the cost forecast produced is foraggregate-only stop loss health insurance.
 39. The computer implementedprocess of claim 32, wherein the cost forecast produced is for specificstop loss health insurance.
 40. The computer implemented process ofclaim 32 for forecasting group term life insurance costs wherein adependent measure for generating the cost forecasting model is theexpected probability of death weighted by the amount of life insuranceto produce the person-level expected term life insurance cost which issummed across the group to produce the group's expected term lifeinsurance cost.
 41. The computer implemented process of claim 32,wherein claim amounts are a mix of fee for service payments andcapitation payments so that the base and underwriting periods riskfactors are appended to include dummy variables for the presence ofcapitation payments by provider type and the cost estimate in the nextand policy periods is the fee for service cost that must be supplementedwith the expected capitation payments.
 42. The process of claim 32further comprising developing group-level cost-forecasting model forgroups in the book of business by capturing the predictive ability ofmain effects and interactions of group-level risk factors which includebut are not limited to groups historical claim amounts, group-level sumof the person-level forecasts, SIC code or industry type,characteristics of the benefit plan design, geographic locale, andnumber of people and length of time covered by the insurance through theapplication of an interaction capturing technique to the modeldevelopment universe of groups.
 43. The computer implemented process ofclaim 42, comprising in addition the step of: setting medical insurancereserves based on the group-level forecast, wherein the next period is areserving period for claims that have not occurred or that have occurredbut not been reported.
 44. The computer implemented process of claim 42for forecasting short term disability costs wherein the interactioncapturing technique uses a group-level dependent measure of residual STDdays at the group-level calculate forecast STD costs by weighting by thegroup's expected STD cost per day.
 45. The computer implemented processof claim 42, wherein medical claim amounts are a mix of fee for servicepayments and capitation payments so that the base and underwritingperiods group-level risk factors are appended to include dummy variablesfor the presence of capitation payments by provider type and the costestimate in the next and policy periods is the fee for service cost thatmust be supplemented with the expected capitation payments.
 46. Theprocess of claim 32 comprising in addition the steps of: providing aprovider type cost trend forecast adjustment to be utilized by at leastone member of the group to be insured; adjusting the person-level nextperiod cost forecast for each member using the health care provider typewith the provider type cost trend forecast adjustment.
 47. An automatedsystem for forecasting future costs attributable to claims from membersof a book of business during an actual policy period comprising: acentral processing unit; an insured person database, accessible by theprocessor, wherein the database comprises person-level enrollment dataand actual underwriting period health care claims data, for members of abook of business to be insured, where the person-level data on a healthcare claim comprises at least a claim amount and a claim code; a modeldevelopment universe database, accessible by the processor, wherein thesecond database comprises model development universe of person-leveldata, comprising enrollment data, historical base period health careclaims data and historical next period claim amount data for astatistically meaningful number of individuals, where the person-leveldata on the base period health care claim includes at least a claimamount and a claim code; a risk factor encoder, accessible by theprocessor, wherein the risk factor encoder encodes claim-based riskfactors for each historical base period based on the claim codeassociated with the health care claim and the risk factor encoderencodes at least one enrollment risk factor based on the enrollmentdata; a model generator, accessible by the processor, that generates acost-forecasting model by capturing the predictive capacity of the maineffects and the interaction of the risk factors assigned by the riskfactor encoder to forecast the historical next period of the modeldevelopment universe data using the historical base period data; aperson-level cost generator that applies the cost-forecasting model tothe person-level actual underwriting period health care claims data ofeach of the members of the book of business to generate a person-levelactual policy period claim amount forecast for each member of the bookof business; and an actual policy period group-level cost forecastgenerator that totals the person-level actual next period forecasts foreach member of the group to generate an actual policy period group-levelcost forecast.
 48. The system of claim 47 wherein the model generatorcaptures the predictive ability of main effects and interactions ofgroup-level risk factors which include but are not limited to groupshistorical claim amounts, group-level sum of the person-level forecasts,SIC code or industry type, characteristics of the benefit plan design,geographic locale, and the number of people and length of time coveredby the insurance through the application of an interaction capturingtechnique to the model development universe of groups.
 49. Acomputer-implemented process of forecasting costs attributable to claimsfrom members of a book of business during an actual policy period,comprising the steps of: means for providing person-level data,comprising enrollment data and actual underwriting period health careclaims data, for members of a book of business, where the person-leveldata on a health care claim comprises at least a claim amount and aclaim code and the actual underwriting period can be, but is notnecessarily, contiguous with the actual policy period; means forproviding a model development universe of person-level data, comprisingenrollment data, historical base period health care claims data andhistorical next period claim amount data for a statistically meaningfulnumber of individuals, where the person-level data on a base periodhealth care claim includes at least a claim amount and a claim code;means for providing claim-based risk factors for each historical baseperiod based on the claim code associated with the health care claim andproviding at least one enrollment risk factor based on the enrollmentdata; means for developing a cost-forecasting model by capturing thepredictive ability of main effects and interactions of risk factorsthrough the application of an interaction capturing technique to themodel development universe; means for applying the cost-forecastingmodel to the person-level data of each of the individuals or members ofa group to generate a person-level actual policy period expected costforecast for each member of the group; and means for producing agroup-level forecast for the actual policy period from the person-levelforecasts of each individual or member of the group by totaling theperson-level cost forecasts for the actual policy period.
 50. The systemrecited in claim 49 wherein the system further is automated such thatwhen actual underwriting period data is provided the systemautomatically provides an actual policy period claim amount forecast.51. The system recited in claim 49 for use by a client having data andan Internet client application, further comprising an Internet serverapplication such that when the client provides actual underwritingperiod data to the Internet server application, the Internet serverapplication automatically provides an actual policy period claim amountforecast.
 52. A group insurance product comprising: an identification ofthe types of benefits which are agreed to be provided by an insurer toor on behalf of members of a group, which will be incurred by members ofsaid group during a future time period; and a stated monetary insurancepremium including a forecast of said benefits made in accordance withthe process of claim 32, estimated costs of administering the insuranceproduct, and optionally, an estimated profit, whereby an insurer agreesto cover the identified benefits in exchange for the payment of thestated monetary insurance premium.
 53. The group health insuranceproduct of claim 52 for insuring short term disability costs wherein theinteraction capturing technique uses a dependent measure from the nextperiod and policy period comprising the number of STD days in the policyperiod and weights the dependent measure by the expected cost per dayfor the STD to produce the person-level expected STD costs and summedacross the group to produce the group's expected STD cost.
 54. The grouphealth insurance product of claim 52 for insuring long term disability(LTD) claims wherein a dependent measure for generating the claim amountforecasting model is the probability of a LTD claim in the policy periodwhere the probability is weighted by the net present value of the LTDand applying the cost forecasting model to the person-level dataproduces person-level expected LTD costs wherein summing theperson-level expected LTD costs across the group to produce a group'sexpected LTD cost for an actual policy period.
 55. The group healthinsurance product of claim 52, wherein the cost forecast is produced forfirst-dollar health insurance.
 56. The group health insurance product ofclaim 52, wherein the cost forecast is produced for specific plusaggregate stop loss health insurance.
 57. The group health insuranceproduct of claim 52, wherein the cost forecast produced is foraggregate-only stop loss health insurance.
 58. The group healthinsurance product of claim 52, wherein the cost forecast produced is forspecific stop loss health insurance.
 59. The group health insuranceproduct of claim 52 for insuring group term life insurance costs whereina dependent measure for generating the cost forecasting model is theexpected probability of death weighted by the amount of life insuranceto produce the person-level expected term life insurance cost.
 60. Thegroup health insurance product of claim 52, comprising a renewalproduct, wherein the model development universe comprises data from themembers of a group in the book of business to be insured.
 61. A methodof reserving for the group health insurance product of claim 48,comprising in addition the step of: setting insurance reserves based onthe renewal group-level forecast for the actual underwriting period,wherein the next period is a reserving period for claims that have notoccurred or that have occurred but not been reported.
 62. A method ofpricing group insurance including a cost of future benefits according tothe computer-implemented process of forecasting future medical costsattributable to claims from members of a group during an actualunderwriting period of claim 32, comprising the additional steps of:providing an expected amount of administrative costs allocable toproviding health insurance coverage to the group; providing a minimumacceptable expected profit; totaling the group level cost forecast,expected amount of administrative costs, and minimum acceptable expectedprofit are to yield a total minimum price, and providing a plurality ofexpected probabilities of retention for the group corresponding to aplurality of possible prices greater than or equal to the total minimumprice, each possible price also having an expected profit that is theamount of the price over the group level cost forecast plus the expectedamount of administrative costs; and calculating a plurality of possiblemaximum profits by multiplying each of the plurality of possible profitsby the corresponding expected probability of retention, wherein thelargest possible maximum profit, is used to price the group insurance.63. A method of pricing group insurance of claim 62 for insuring shortterm disability costs wherein the interaction capturing technique uses adependent measure from the next period and policy period comprising thenumber of STD days in the policy period and weights the dependentmeasure by the expected cost per day for the STD to produce theperson-level expected STD costs and summed across the group to producethe group's expected STD cost.
 64. A method of pricing group insuranceof claim 62 for insuring long term disability (LTD) claims wherein adependent measure for generating the cost forecasting model is theprobability of a LTD claim in the policy period where the probability isweighted by the net present value of the LTD and applying the costforecasting model to the person-level data produces person-levelexpected LTD costs wherein summing the person-level expected LTD costsacross the group to produce a group's expected LTD cost for an actualpolicy period.
 65. A method of pricing group insurance of claim 62,wherein the pricing is produced for first-dollar health insurance.
 66. Amethod of pricing group insurance of claim 62, wherein the pricing isproduced for stop loss health insurance.
 67. A method of pricing groupinsurance of claim 62, wherein the pricing produced is foraggregate-only stop loss health insurance.
 68. A method of pricing groupinsurance of claim 62, wherein the pricing produced is for specific stoploss health insurance.
 69. A method of pricing group insurance of claim62 for insuring group term life insurance costs wherein a dependentmeasure for generating the cost forecasting model is the expectedprobability of death weighted by the amount of life insurance to producethe person-level expected term life insurance cost.
 70. A method ofpricing group insurance of claim 62, comprising a renewal product,wherein the model development universe comprises data from the membersof a group in the book of business to be insured.
 71. A method ofunderwriting an insurance product comprising the steps of: providing anidentification of the coverage of the insurance product which identifiesthe conditions of payment under the product during a policy period;providing person-level health care claim information comprisingenrollment data, and base period and underwriting period claim data, theclaim data comprising claim codes having associated claim costs;capturing the predictive ability of the person-level health care claiminformation through the application of an interaction capturingtechnique; and forecasting a predicted cost of the insurance productduring the policy period based on the identification of the coverage ofthe insurance product and the captured predictive ability of theperson-level health care claim information; wherein each of diagnosisand CPT based risk factor is independent of the sequence in time ofother diagnosis and CPT based risk factors.
 72. The method ofunderwriting an insurance of claim 71, for insuring short termdisability costs wherein the interaction capturing technique uses adependent measure from the next period and policy period comprising thenumber of STD days in the policy period and weights the dependentmeasure by the expected cost per day for the STD to produce theperson-level expected STD costs and summed across the group to producethe group's expected STD cost.
 73. The method of underwriting ainsurance of claim 71, for insuring long term disability (LTD) claimswherein a dependent measure for generating the cost forecasting model isthe probability of a LTD claim in the policy period where theprobability is weighted by the net present value of the LTD and applyingthe cost forecasting model to the person-level data producesperson-level expected LTD costs wherein summing the person-levelexpected LTD costs across the group to produce a group's expected LTDcost for an actual policy period.
 74. The method of underwriting ainsurance of claim 71, wherein the cost forecast is produced forfirst-dollar health insurance.
 75. The method of underwriting ainsurance of claim 71, wherein the cost forecast is produced for stoploss health insurance.
 76. The method of underwriting a insurance ofclaim 71 wherein the cost forecast produced is for aggregate-only stoploss health insurance.
 77. The method of underwriting a insurance ofclaim 71 wherein the cost forecast produced is for specific stop losshealth insurance.
 78. The method of underwriting a insurance of claim 71for insuring group term life insurance costs wherein a dependent measurefor generating the cost forecasting model is the expected probability ofdeath weighted by the amount of life insurance to produce theperson-level expected term life insurance cost.
 79. The method ofunderwriting a insurance of claim 71 comprising renewal underwriting,wherein the model development universe comprises data from the membersof a group in the book of business to be insured.
 80. The method ofunderwriting a insurance of claim 71 comprising in addition the step of:setting insurance reserves based on the renewal group-level forecast forthe actual underwriting period, wherein the next period is a reservingperiod for claims that have not occurred or that have occurred but notbeen reported.